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Toward an Integrated Program of Research 
on Psychotherapy ° 


J. McV. Hunt 


University of Illinois 


It is generally agreed that the number of 
persons in English-speaking America who seek 
professional help for personal problems has 
been increasing rapidly. It is often argued that 
this increased seeking represents increased 
human need, that more people are unhappy, and 
that American life has become increasingly un- 
satisfying. On the contrary, one may also argue 
that this increased seeking for such help is 
based on a growing prevalence of the hope and 
belief that such professional persons as social 
workers, psychiatrists, and psychologists can 
help. From this standpoint, it is demand, in 
the economic sense, that is changing rather than 
the number of troubled people. Distinguishing 
with measuring operations between “need” and 
“demand” is an unsolved problem of consider- 
able importance for social interpretations. 
Whether this increased seeking represents need 
or demand, however, it stresses the importance 
of understanding these personal problems, of 
increasing our knowledge of the techniques for 
helping people with them, and of the results 
obtained with existing techniques. 


The most common helping method involves 
one person, a professional, consciously using his 
speech and gestures to develop a social relation- 
ship, a social relationship aimed at providing 
a corrective experience, with another person, 


1This paper, presented as the address of the 
president of the Division of Personality and Social 
Psychology at the meetings of the American Psycho- 
logical Association in Chicago, September 1, 1951, 
was prepared while the author was director of the 
Institute of Welfare Research, Community Service 
Society of New York. He wishes to express his ap- 
preciation to the Community Service Society, the 
Rockefeller Foundation, the Carnegie Corporation, 
and the Davella Mills Foundation for support of 
the research program of the Institute. This research 
program provided him with the experience out of 
which the ideas expressed here were derived. 


his client. This is a broad definition for the 
helping techniques which may be subsumed 
under the general term psychotherapy. Their 
use is shared professionally by social work, 
guidance, the ministry, psychiatry, psychology, 
and occasionally by other professional groups. 
Needless to say here, this helping situation has 
provided much of our present information 
about personality development and dynamics. 
Psychotherapy has been and will continue to 
be a kind of window to the soul of man. But 


‘it is not with psychotherapy as a research tool 


that this paper is concerned. It outlines some 
thoughts about what we know and do not 
know about the psychotherapies as helping tech- 
niques and especially about their results. In it 
I wish (a) to outline the nature of our in- 
formation and ignorance about their results, 
(b) to call attention to the segmental and 
piecemeal nature of nearly all of our current 
research concerning them, (c) to indicate what 
an integrated approach might be like, and (d) 
to point to both the difficulties and to the 
factors leading me to hope that an integrated 
approach may be feasible. 


What We Know of Results 


What we know about the “results” of the 
psychotherapies is very largely limited to at- 
tempts to answer what one may call the first 
evaluative question [10, p. 38], namely, is 
there improvement in the person associated 
with ‘his receiving help? Our published in- 
formation consists of percentages of the in- 
dividuals improved or improved to various de- 
grees. The most common measure of improve- 
ment is an evaluative judgment of change in 
the client or patient by a psychotherapist dur- 
ing the course of the therapeutic contact. This 
is true, for example, for psychoanalytic therapy 
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with individuals diagnosed psychoneurotic. As 
surveyed by Knight [15] the percentage of at 
least “somewhat improved”’ is either 92 or 66 
depending on whether or not one eliminates 
from the denominator those patients who dis- 
continued treatment against the psychoanalyst’s 
advice. Wilder [31] argues, and I would 
agree, that they should not be eliminated from 
the denominator. It is also true for psycho- 
therapy in outpatient mental hygiene clinics. 
Wilder [31] reports the percentages of psy- 
choneurotics improved at the time of discharge, 
for samples from different clinics, varying from 
40 to 83 with the mode at approximately 50 
per cent. For family casework, which by our 
definition is largely psychotherapy, the per- 
centages improving range from 60 to somewhat 
more than 70 per cent [7, 10, 28]. For 
marriage counseling [8] the percentage im- 
proved approximates 67. It should be noted 
that a figure of about two-thirds improved is 
exceedingly common for the psychotherapies. 

In a few instances our information consists 
of percentages of individuals showing improve- 
ment in their adjustive status as between the 
time they sought or were referred for help and 
the time of a follow-up study. Here again the 
measure is judgmental, but the judgment is 
usually made by the follow-up interviewer in- 
stead of the psychotherapist. An example is 
the group of follow-up studies of the results of 
child-guidance clinics supervised by Helen 
Witmer [33]. From these, the composite per- 
centage of children judged to be improved ap- 
proximates 75. For a third of these improving 
(25 per cent of the samples followed up), the 
original problem had disappeared and no new 
problems had appeared; the child had friends, 
his school work was consistent with his ability, 
he was a steady, reliable, and interested worker 
or student, and at home he was a friendly, co- 
operative member of the family. For the other 
two-thirds of those improving, the level of 
social adjustment at follow up was less desir- 
able. 

During the last decade, in what is probably 
the most vigouous research program in this 
area, but where the emphasis is more on process 
than results, Rogers [25, 26] and his students 
have devised several measures of change in the 
content of what a client says as nondirective 
or client-centered psychotherapy progresses. 
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One example is a measure of change in what 
Raimy [24] calls the self-reference. ‘This par- 
ticular measure, deriving from phenomenologi- 
cal theory [30], has been shown by both Raimy 
[24] and Kogan and Horowitz [9] to be 
essentially the same as the Distress-Relief 
Quotient, devised by Dollard and Mowrer 
[3], which derives from learning theory. It 
is still an open question whether or not such 
measures of change in process will predict 
changes in social behavior, client reports of 
distress relieved, or change in social accept- 
ability. 

Recent years have also brought attempts to 
measure the change in persons associated with 
psychotherapy by means of such clinical tools 
as the Rorschach test. Witness, for example, 
Muench’s [22] study of nondirective psycho- 
therapy by means of the Rorschach. The 
validity of such measures for the purpose must 
also be established. Who knows how well 
changes in Rorschach will predict changes in 
social behavior, reports of distress relieved, and 
social acceptability ? 


What We Do Not Know 


This is an incomplete account designed only 
to indicate the general nature of what we know 
about results.? The variation in the percentages 
of clients improving may be related to several 
uncontrolled factors: 


1. The unreliability or difference in the 
judgmental standards by which the various 
samples were judged; 

2. The unreliability of diagnosis or the 
variety of persons-with-problems ; 

3. The nature and skill of the psycho- 
therapists ; and 

4. Of no mean import, to the variety of 
ways in which the samples were limited, e.g., 
(a) to all those applying for help and under- 
taking to start psychotherapy [Heckman and 
Stone, 7], (4) to only those for whom psycho- 
therapy was completed in the judgment of the 
psychotherapist [Knight, 15], or (c) to what- 
ever intermediate between these two extremes. 

But I wish to call your attention to the rela- 


2More recent studies of the results of psychother- 
apy with m-thods not illustrated here are presented 
in Psychotherapy: A symposium on theory and re- 
search [21]. 
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tively obvious, but too frequently unrecog- 
nized, fact that were all these factors con- 
trolled, we should still have information only 
about the first evaluative question [10, p. 38], 
namely, is there change associated with receiv- 
ing psychotherapy? ‘There need be no causal 
relationship, and the term results is logically 
an exaggeration. 

In view of the infrequency with which one 
sees or hears it, perhaps elaboration of this 
argument may be useful. The fact that im- 
provement is associated with receiving psycho- 
therapy in a fairly high proportion of individ- 
uals (two-thirds appears to be the most repeti- 
tive figure) argues that psychotherapy may be 
getting some desired results. If, on the con- 
trary, no one improved, we could conclude that 
there were no desired results. On the other 
hand, even though neurotic and psychotic dis- 
orders tend to be seen theoretically as perver- 
sions of the normal adjustive process whereby 
distress from any source tends to set in motion 
processes that mitigate that distress, most of us 
have known both neurotics and _ psychotics 
(even schizophrenics) who appear to have re- 
covered when no one was making any attempt 
to help them. Furthermore, Landis [17] has 
found that remission rates from psychopathic 
hospitals have changed very little during the 
last century even though the amount of psy- 
chotherapeutic effort has increased consider- 
ably. Thus, until we find out how frequently 
the changes associated with psychotherapy 
would occur without it, we cannot logically 
attribute them to psychotherapy. 


The Need for Control Groups 


From a methodological standpoint, it is pre- 
cisely here that the case method, indigenous to 
all psychotherapies, breaks down. For all the 
talk about idiographic and nomothetic psychol- 
ogy, I believe we must accept the dictum that 
it is impossible to repeat, and thereby test, the 
reproducibility of the psychotherapeutic action 
with one individual. We must, therefore, em- 
ploy the statistics deriving from groups of in- 
dividuals if we are to get any empirical test of 
hypotheses about causal relations between psy- 
chotherapy and change in clients or patients. 

At this point, let me digress back to the im- 
portance of knowing to what degree the psy- 
chotherapies get results. I suspect that we psy- 


chologists may be responsible to a considerable 
degree for the increased demand for profession- 
al help with personal problems. More and 
more students are taking our courses. [hey 
hear our explanations of personality dynamics. 
‘They are expose d to case studies, all too often 
unconsciously selected because ‘the client lived 
happily ever after’ treatment. From this 
standpoint, the college teaching of psychology 
becomes a mighty agency for advertising ps\ 

chotherapeutic wares. Moreover, it is very 
easy to link these wares to our humanistic tradi 


tion. It then becomes the responsibility of so- 


| 
ciety to supply psychotherapeuti ids. I hese 
wares of ours, which we share with our asso 
ated professions, are thereby removed from 


even the economic checks of the market place. 
These facts emphasize our professional re- 
sponsibility to our society to know whether or 
not and to what degree the psychotherapies get 
results. In finding out, we should be even 
more concerned to learn as much as we can 
about what kind of psychotherapy works best 
for what kind of client. 

The impossibility of testing the reproduc ibil 
ity of psychotherapy with any single case, as | 
have just pointed out, forces us to depend 
methodologically upon the use of statistics de- 
riving from groups. Control-group design is 
the most obvious answer, but it affords practi- 
cal difficulties and its piecemeal nature leaves 
considerable to be desired. Ordinary control- 
group design cannot answer questions about 
what kinds of psychotherapy work best with 
what kinds of clients. 

The practical difficulties abide in the conflict 
between our humanitarian desire to help the 
afflicted and the logical requirement that the 
experimental (psychotherapy receiving) and 
control groups be statistically equivalent. Any 
planned interference with giving help to those 
who are suffering and asking for help is con- 


3I am using the term results broadly. It need not 
be limited to so-called “cures” wherein the treated 
person becomes undistinguishable from healthy peo- 
ple of his social and work-group. It need not even 
be limited to discernible improvement in treated 
persons. It can also include absence of deteriora- 
tion in those for whom deterioration was expected 
and for whom the treatment was aimed only at sus- 
taining them through a crisis. I should want to 
detect this absence of deterioration, however, as 
less frequent deterioration among treated groups 
than among similar groups untreated. 
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trary to our basic concern for the worth of each 
individual. Yet, is there any more suitable or 
feasible way to make groups statistically equiv- 
alent than to select them randomly from a 
population? Operationally this would mean 
withholding help by arbitrary turn from some 
previously decided proportion of those seeking 
it.* The most serious attempt of this kind of 
which I know is that in the Cambridge-Somer- 
ville Youth Study. This attempt was con- 
cerned with help designed to prevent rather 
than to cure [23]. Moreover, it was not en- 
tirely successful because those in the control 
group all too often received help from other 
agencies. Withholding help appears especially 
bad to most psychotherapists because their very 
occupation almost demands that they be well 
convinced about the efficacy of the help they 
give. Thus, the desire of psychotherapists to 
be of help coupled with their professional con- 
victions stand as a serious obstacle in the way 
of a direct experimental approach to increasing 
knowledge about the results of psychotherapy. 

Nature, however, has provided ways to 
avoid this obstacle in research on psychotherapy 
that correspond somewhat with the circum- 
stances used for the testing of hypotheses in 
such fields as astronomy and geology. For in- 
stance, seldom can the centers offering psycho- 
therapy give the kind and amount of help they 
would wish to give to all the individuals who 
apply. At present, those who are judged to be 
most in need are given the best trained and 
most experienced psychotherapists, be they so- 
cial caseworkers, psychologists, or psychiatrists. 
Or. those judged to be most in need receive the 
larger portions of the therapists’ time. One 
need only turn skeptical of the validity of these 
diagnostic judgments of need to justify the 
process of randomizing, that is, assigning appli- 
cants to the various therapists in some arbitrary 
order. 

In one general hospital, the medical social 
workers who face such a situation are consider- 
ing a randomizing procedure whereby an 
arbitrary portion of patients selected by arbi- 


*It is sometimes suggested that those who apply 
but do not return for psychotherapy after the intake 
interview be utilized as the control. This is not 
logically feasible because, in failing to return or 
in deciding not to return, these people are essen- 
tially different from those who continue “seeking 
help.” 
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trary turn will have the help of relatively un- 
trained receptionists while the others will re- 
ceive the help of the most skilled medical social 
workers obtainable. Here the independent 
variable will be the training and experience 
(presumably skill) of those attempting to help 
patients. 

On the other hand, I see little chance of 
even such “natural” substitutes for the classi- 
cal experiment getting done except in a few 
places where the administrator of the helping 
agericy is imbued with the skepticism of a sci- 
entist and can thereby feel ethically free, or 
better, ethically bound to conduct such a pro- 
I believe this will be done 
only seldom because the typical practitioner 
sees in such “control-group” design an answer 
only to a question for which he is convinced 
that he already has the answer, and no answer 
to the question for which he eagerly seeks one. 
To be concretely explicit, he tends already to 
be sure that the training and experience of the 
therapist make a difference to the patient. 
What he wants to know is: what kinds of 
therapeutic approaches work best with what 
kinds of clients or patients? To this last ques- 
tion he sees no answer forthcoming from classi- 
cal control-group design. Control-group design 
can teJl us, for each given type of psychothera- 
py, to what degree it gets results on the whole. 
It is a clumsy design, however, for answering 
the question in which both psychotherapists and 
personality theorists are most interested, name- 
ly, what kind of therapeutic approach works 
with what kind of clients or patients? 


cedure rigorously. 


A great deal of folklore now passes for psy- 
chological science in this area. All too com- 
monly deductions are made from theoretical 
propositions which have little in the way of un- 
equivocal confirmation from empirical observa- 
tions. Because these propositions and deduc- 
tions from them are often so stated that they 
scarcely permit empirical test, or because our 
current research designs do not test them, they 
can pass as dogma. In this sense, we are faced 
with what I should like to call our various 
“schools of conviction” in psychotherapy. 


5Shaffer [26] has already called attention to the 
manner in which the various explanatory theories 
indigenous to psychotherapy are constructed to pro- 
tect the security of the psychotherapist and to serve 
as aids in training and indoctrination. 
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Let me give you just one concrete example. 
In one of our student-training clinics where 
psychotherapists from various “schools of con- 
viction” serve as supervisors, the convictions of 
the chief are of the orthodox psychoanalytic va- 
riety. He has dictated, on a priori grounds, that 
no patient exhibiting obsessive-compulsive 
trends is to be handled in nondirective fashion 
because “such an approach can only be damag- 
ing” to such people. Although my predilections, 
also deductive, would tend to agree with those 
of this chief, I have become highly skeptical 
about everything we think we know in this 
whole area. I have therefore asked some of my 
friends of nondirective conviction about any 
psychotherapeutic experience they may have 
had with clients showing obsessive-compul- 
sive trends. Each reports some of his most 
outstanding therapeutic successes with obses- 
sive-compulsive clients. Of course, this “I- 
cured-a-case” method proves nothing about the 
relative efficacy of psychoanalytic versus non- 
directive psychotherapy with obsessive-compul- 
sive cases, but it does call into question any 
deductively derived conviction about the neces- 
sarily injurious nature of nondirective psycho- 
therapy for such cases. 

Let me ask just a few other still empirically 
unanswered questions. 


1. Does psychotherapy aimed at the pro- 
duction of corrective experiences for specific, 
diagnostically determined personality defects 
produce improvement any more or less often 
than psychotherapy which aims only at provid- 
ing the proper social-emotional atmosphere for 
healthy personal growth? And two corollary 
questions: (a) In what ways are these psycho- 
therapies actually different in the discernible 
behavior of psychotherapists? (4) Are there 
distinguishable classes of clients for which one 
system works better than the other? 


2. Do psychotherapists who exhibit in high 
degree the various characteristics and skills we 
try to teach get better results than those who 
do not? Or, a subquestion of this kind: Do 
psychotherapists who have more nearly ideal 
psychotherapeutic relationships with their cli- 
ents, as defined by Fiedler’s interesting theory 
and Q-technique, get better results than psy- 
chotherapists who have less ideal relationships 
with their clients? 
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3. Are the various schools of psychoanalytic 
conviction differentiated by cither the propor- 
tion of clients in general or the proportions of 
the various kinds of clients, who show im- 
provement? 


These are hard questions, hard because they 
call simultaneously for statistically manageable 
data in three areas: 


a. About the client and his characteristics ; 
4. About the psychotherapist; and 


c. About the change in the client associated 
with his receiving psychotherapy. 


Control-group design is poorly adapted tor 
such questions. Yet these constitute the very 
kind of questions to which answers are most 
important from both the practical standpoint 
of practitioners and the theoretical standpoint. 
Research in this field demands that we develop 
first the methods of measurement for each of 
these areas and then that we develop a social 
and administrative organization such that we 
can have data from all three areas simultane- 


ously.® This is the theme of my song. 


The Piecemeal Nature of Most Current 
Research in Personality 
Diagnosis and Psychotherapy 


Our scientific development in personality 
diagnosis and psychotherapy has been increas- 
ing markedly in both volume and quality. Let 
me mention some of the specific developments. 

The work of Carl Rogers and his students 
has demonstrated the feasibility and value of 
verbatim recording for reseerch, provided 
methods of content analysis, yielded solid 
knowledge of process as psychotherapy is con- 


ducted nondirectively or in client-centered 
61 do not wish to deprecate the role of the indi 
vidual investigator working by himself on the prob 
lems close to his own heart and in his own way. 
The sciences of today are largely the cumulative 
result of such individual efforts. I believe that the 
frontiers of science will continue to be pushed back 
by such individual efforts, perhaps largely by such 
individual efforts. Moreover, the “way of life” of 
the individual investigator is one to be envied. But, 
as Marquis [19] has pointed out so well, certain 
problems do not lend themselves to solution by indi- 
vidual effort. Such problems seem especially prev- 
alent in the area of human relations. I am saying 
that the problem of what kind of psychotherapy 
works with what kinds of patients is one of these 
which cannot be solved by individual effort. 








242 


fashion, and lent a great impetus to research 
in psychotherapy. It has also contributed one 
indigenous theory of 
process. 


the psychotherapeutic 


The developments in the factor-analysis ap- 
proach to personality description, especially the 
work of Cattell [1], Eysenck [4], and Witten- 
born [34] is of great interest although I ques- 
tion whether the proper measures have been 
analyzed as yet to provide us with factors con- 
trolling the variance in the results of psycho- 
therapy. 

The theoretical developments resulting from 
serious application of the principles of learn- 
ing theory to this area by Dollard and Miller 
[2] by Mowrer [20, 21] and others [e.g., 30] 
provide a rich source of researchable hypotheses 
and a useful conceptual framework. 

‘The attempts to assess the change in persons 
associated with their receiving psychotherapy 
are beginning to be fruitful. Here I hope the 
work of our group at the Institute of Welfare 
Research of the Community Service Society 
may be pertinent to psychotherapy generally 
even though the methods were developed in a 
social work setting to measure the results of 
social casework [10, 11]. Somewhat related 
to this in one sense is the program of research 
by McQuitty [18] aimed at a measure of social 
adjustment. 

In another area, there is the work of Fiedler 
[5, 6] which has the merit of comparing psy- 
chotherapists from three different schools of 
conviction on at least one variable in their 
practice which all agree is important, the 
quality of their relationships with clients. 
Here the empirical results show experts from 
different schools of conviction, as defined by 
reputation, to be more alike than experts and 
novices from the same schools. In view of the 
emotional heat engendered by our diiferences 
in conviction, it would be exceedingly interest- 
ing to compare the behavior of practitioners 
from different schools on other variables. I 
wonder whether the differences in the pattern 
of vocal noises they make about their practice 
may not be greater than the discernible differ- 
ences in the practice itself. 

A projected program about to start is 
Shakow’s plan to make sound movies of psy- 
chotherapeutic interviews. These movies 
should be extremely useful. One use I par- 
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ticularly see for them is to determine how 
closely judgments of variables from sound re- 
cordings of interviews approximate judgments 
of the same variables from sound movies of 
interviews. If it should turn out that the visual 
cues add little, we could proceed with more 
confidence in the results obtained from the 
more readily obtainable sound recordings. 

All these examples of work under way or 
projected are programnatic and extensive in 
nature. There are other programs and a good 
many fine studies that I cannot take time to 
and still others that | 
about. We may 
professional pride in this work, but I have 


actually been motivated to mention these pro- 


mention do not even 


know pss chologists share 


grams by a major shortcoming which they all 
share. They are too piecemeal and segmental 
to provide answers to the kind of questions I 
asked a moment ago. This criticism does not 
apply to the theoretical developments, but it 
does to all the empirical programs that I have 
mentioned. 

Let me ask some questions to point up my 
meaning. Do the descriptions of the psycho- 
therapeutic process found in the content analy- 
sis of consecutive nondirective interviews hold 
also for other types of psychotherapy? How 
well are the various measures of psychothera- 
peutic progress deriving from content analyses 
of interviews correlated with changes in social 
? 


behavior? And do the subjective changes per- 


sist? Do they persist more in some kinds of 
clients than in other kinds? Such questions 
call for the combining of the excellent methods 
developed by Rogers’ group with other meas- 
ures, 

Some of the factor analyses of personality 
variables are directed specifically toward test- 
ing the validity of our present clinical nosology 
[e.g., Wittenborn, 34] but much of it is 
isolated and appears to be done with the idea 
of discovering the basic nature of man [e.g., 
Cattell, 1]. I wonder whether the validity of 
any factors deriving from factor analysis will 
not have to be tested by means of the degree 
to which they predict something we want to 
know. In the context of this paper, we want 
to predict the variance in measures of the 
change associated with psychotherapy. I am 
prompted to recall for you at this point the 
title of a paper by the late John G. Jenkins, 
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“Validity for What?” [12]. I would be the 
last to argue in this connection that there are 
no basic laws of behavior, but my intuition 
argues that the place to find them is not in 
descriptive dimensions of personality. 

Let me turn to our own work on measuring 
movement in the clients of social casework. 
We have succeeded in standardizing the judg- 
ment of different workers to the extent that the 
judgments of individual workers trained on 
our Movement Scale show correlations of the 
order of +.9 with the averaged judgments of 
a group on a set of test cases. The mean inter- 
judge correlation for individual case workers 
is of the order of +.8. We have thus begun 
to overcome the lack of a judgmental standard 
for assessing the change associated with case- 
work. Use of this scale can tell us only about 
movement associated with casework, however ; 
it cannot, strictly speaking, tell us about the 
results of casework until it is used either in 
connection with control-group design, or in 
conjunction with a proper system of diagnostic 
classification and proper description of case- 
worker or psychotherapist behavior. 

In connection with Fiedler’s measure of the 
quality of the psychotherapeutic relationship, 
do those psychotherapists who establish rela- 
tionships approaching the ideal get improve- 
ment in a higher proportion of their clients 
than do those who establish 
which fall far short of this ideal ? 

I have asked enough questions to indicate 
how even programmatic research may fall short 
if several kinds of variables are not measured 
and simultaneously taken into account in con- 
junction with one another. Only then can we 
begin to get solid answers to the complex yet 
basic questions about psychotherapy, about the 
results of psychotherapy, and about what kinds 
of people profit most from what kinds of psy- 
chotherapy. Such data would, I believe, con- 
tribute to personality theory as well as to our 
practical knowledge of the helping process. 


relationships 


The Nature of an Integrated Design 


What would a design to answer these com- 
plex questions look like? As is common in sci- 
ence, in outline it is simple. In execution, how- 
ever, it would be exceedingly complex, so com- 
plex, in fact, that I hesitate to present my 
thoughts for fear that they may be taken for 
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delusions of grandeur. In outline, three kinds 
of data would be required for each psycho- 
therapeutic situation, defined as a client apply- 
ing for and getting psychotherapeutic help. 
‘These three kinds of data or measures are (a) 
diagnostic measures of the client, his problems, 
his attitudes about them and about seeking 
help, his social situation, etc., (4) records of 
the psychotherapeutic process to provide meas- 
ures of therapist behavior and its interaction 
with client behavior, and (c) 
change based on both psychotherapeutic process 


measures of 


and direct evidence of change in client comfort, 
change in client behavioral productivity and 
change in the reactions of other persons toward 
the client.’ ‘The diagnostic measures and meas- 
ures of therapist behavior would constitute 
in combination the independent variable, and 
change in the client the dependent variable. 
From the sampling standpoint, this design 
would call for a broad cross-sectional sample 
of the people who came for help. One should 
also have several representatives of each of the 
various schools of psychotherapy as both novices 


and as highly experienced practitioners. “The 
individuals seeking help should, ideally, go 
randomly to all kinds of practitioners. his 


in outline is one conception of an ideal, in- 
tegrated, methodological approach to the study 
of the interrelated factors in psychotherapy. It 
is a conception that I have repeatedly come to 
whenever I have purposefully released myself 
from the restraints of everyday reality in order 
to dream about what the scientific method 
would mean in this area if it were taken seri- 
ously. 

Anyone who has ever done a piece of psy- 
the task 
of developing and testing the reliabilities of a 
proper set of measures for each area, the file 
after file of process records, the enormous task 
of analyzing them, the nearly endless calcula- 
tions. Such obstacles raise painful qualms in 
any responsible investigator. But these are 
relatively simple difficulties compared to those 
one can forsee on the social side. Think of the 
task of getting the cooperation of an adequate 


chological research can see difficulties: 


TDiscussion of the criteria of change here would 
take us too far afield. The three sources mentioned 
here are elaborated in the After Comments by Ko- 
gan and myself to the symposium on psychological! 
theories and the evaluation of the results of psycho- 
therapy [Psychol. Serv. Center J., 1950, 2, p. 135]. 
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sample of psychotherapists from the various 
schools. Think of getting them to alter their 
practice sufficiently to get the necessary data 
about all their clients. Think of overcoming 
their ethical concern to follow their convictions 
about how the various kind of help seekers 
should be treated in order to permit a random 
distribution of all kinds of them to all kinds 
of psychotherapists. Think of the complexity 
of the administrative organization required to 
talk through these obstacles and to keep prac- 
titioners and research personnel in functional 
collaboration. Think of the problem of getting 
the large financial support necessary over a 
sufficient period of time. 

On the other hand, as I see it, the stakes are 
high both in human welfare and in the amount 
of money involved in giving psychotherapeutic 
services. Moreover, there are glimmers of hope 
that something approximating such an ideal 
arrangement for an integrated program of re- 
search on psychotherapy may be gradually be- 
coming feasible. It will mean the social and 
administrative organization of research on a 
large scale, but not so large as the Manhattan 
Project and perhaps not so very much larger 
than the VA project on the selection of clinical 
psychologists, a report of which has just been 
completed by Kelly and Fiske [14]. 

Moreover, such a program need not be or- 
ganized, in fact, should not be organized on a 
large scale all at once. One can use even a 
single university clinic as a pilot plant, especial- 
ly if one deliberately brings together to work 
in it psychotherapists from various schools. In 
such a setting the details of design and organi- 
zation can be worked out on a small scale. The 
plan for a proper diagnostic study before treat- 
ment can be worked out. Here present prac- 
tice would have to be modified so as to permit 
statistical manipulation of the diagnostic vari- 
ables. The procedure of turning all diagnostic 
data into ratings, as used by Kelly and Fiske 
[14] in their VA selection study, has merit. 
Such a study should provide data with which 
other data from follow-up studies of the same 
individual may later be compared as one meas- 
ure of the change associated with psychothera- 
py. Ways of short-cutting the immense labor 
of content analysis of verbatim records in order 
to measure therapist and client behavior in the 
psychotherapeutic situation should be explored. 
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Intake workers or those who assign clients to 
therapists should work out both their phil- 
osophy and their procedures. It must be recog- 
nized, however, that such a pilot plant opera- 
tion in a university clinic cannot answer the 
important questions by virtue of the limited 
nature of the sample of clients. 

This is a major obstacle for I see little 
chance of organizing a chain of clinics in 
which randomizing of all kinds of clients to 
all kinds of psychotherapists would be carried 
out. On the other hand, I believe we can 
utilize what one might term the experiments 
of nature hereby getting the collaboration of 
existing clinics. These tend individually to be 
manned by given schools of psychotherapists. 
It is likely that each of these clinics attracts 
nearly the whole range of help seekers, al- 
though perhaps the distributions vary. If the 
collaboration of several clinics, in which the 
practitioners represent differing schools, could 
be obtained, and if they would alter their prac- 
tice enough to utilize a standardized diagnostic 
study, provide verbatim records of at least 
proper samples of therapeutic interviews, apply 
standardized measures of change in clients and 
permit follow-up study by someone other than 
the psychotherapist or the person making the 
diagnostic study, we could go a long way 
toward answering the questions I have raised. 

There are signs that getting such collabora- 
tion is not idle dreaming. There are adminis- 
trators of a few clinics and social work agencies 
scattered over the country who really want to 
know what happens in and as a result of the 
service they offer. If someone would pay for 
the research procedures extraneous to their 
services, the diagnostic studies, the recording 
of interviews, and the follow-up studies, they 
would be glad to collaborate in such a program 
as outlined. I believe several of the iounda- 
tions would be happy to pay toward the bill 
for such research procedures if their staffs of 
scientists were convinced that the plan and the 
tools developed promised just a fair chance of 
getting answers to some of these important 
questions. 

There are also signs deriving from research 
and practice now going on that the various 
parts of the integrated plan I have outlined 
are feasible. We already conduct diagnostic 
studies of clients in a large share of clinics be- 

















Integrated Research on Psychotherapy 


fore psychotherapy begins. It is merely a task 
of standardizing these diagnostic studies to 
yield in comparable form as broad a sample of 
relevant behavioral data as feasible. The fact 
that judgments of movement in clients can be 
standardized by a process of scaling and train- 
ing supports the hope that the same could be 
done for judgments by psychotherapists of a 
list of diagnostic and process variables. Psy- 
chotherapists commonly have qualms about 
client reactions to the electrical recording of 
interviews, but a study by Kogan [16] has 
shown that the amount of concern or resistance 
in an unselected sample coming for family case- 
work service was exceedingly small. Concern 
for their clients leads psychotherapists to ques- 
tion the very idea of follow-up studies by per- 
sons other than themselves, yet our follow-up 
study of family casework at the Institute of 
Welfare Research has shown that, approached 
properly, nearly all ex-clients appear to accept 
and to approve being interviewed about the 
results of their experience. These signs lead 
me to be hopeful. 


Recapitulation 
To recapitulate, 


1. ‘The increasing use of professional help 
with personal problems, whether it derive from 
either need or demand, makes it important to 
understand both the problems and the psycho- 
therapeutic helping process. This paper has 
been focused upon the results of the latter. 

2. We know relatively little about the re- 
sults of psychotherapy. Our present informa- 
tion consists largely of the percentages of cli- 
ents or patients showing various degrees of im- 
provement associated with their receiving help. 


3. Control-group design, where the change 
is measured in randomly selected groups of 
help seekers who did receive and who did not 
receive help, could answer the question of re- 
sults for each type of psychotherapy, but such 
a design would leave unanswered a major 
share of the important questions about what 
kind of psychotherapy gets best results with 
what kinds of people seeking help. 


4. Several excellent programs of research 
in this area are now under way, but their limit- 
ed nature makes it impossible for them to yield 
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answers to many of the most important ques- 
tions about psychotherapy. 


5. Taking into account only the demands 
of logic and the scientific method, an ideal 
design for an integrated program of research 
on psychotherapy would include, in outline, 
getting 

a. Diagnostic data about the person seeking 
help to permit the development of a system 
of classification, 

b. Data on the process of psychotherapy 
appropriate to yield significant measures of 
therapist behavior, and 

c. Measures of client change associated 
with psy‘ hotherapy. The first two of these may 
be interrelated to supply the independent vari- 
able while client-change may be taken as th 
dependent variable. 

6. Finally, I have pointed out that the im- 
plementation of an integrated program would 
be extremely difficult, but that there are con- 
siderations arguing that it may be feasible to 
achieve something approximating such an ideal 
integrated design. It is in the hope of speeding 
that day that | have exposed you to my dream 
world for research in this area. 


Received Nowember 7, 1951. 
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In the clinical application of psychological 
tests signs and patterns have frequently been 
proposed as diagnostic aids. The Szondi Test 
[4] literally abounds in signs and patterns 
suggested as indicative of certain pathological 
conditions. It is the purpose of this study to 
determine the extent to which signs and pat- 
terns described by Szondi [4] and Deri [3] 
as characteristic of idiopathic epileptics and 
overt homosexuals actually differentiate these 
two groups. 


Procedure 


As part of another investigation [2] the 
Szondi Test was individually administered 
to each of 100 diopathic epileptics and 100 
overt homosexuals. All of the epileptics had 
long histories of seizures, were under anticon- 
vulsive drug treatment, and were considered 
nonhomosexual. All the homosexual subjects 
fulfilled the basic criterion of having sexual 
histories either exclusively homosexual in na- 
ture or predominantly so. None were epileptic. 
All the subjects tested were single white males 
between the ages of 18 and 49, nonpsychotic, 
and not deteriorated intellectually. 

Deri [3] offers four diagnostic indicators 
for homosexuality and eight for epilepsy. 
Szondi [4] suggests five signs for homosexu- 


1The epileptic test profiles were collected at Caro 
State Hospital, Michigan, with the aid of Ona and 
Phil Margules. William Trembath, Senior Psychol- 
ogist of the Ionia State Hospital, Michigan, ob- 
tained a major portion of the homosexual records. 
Additional homosexual tests were gathered by Rich- 
ard Benjamin in New York City. The New York 
State Psychiatric Institute Research Project on Sex- 
ual Offenders aided the study, through the coopera- 
tion of Dr. Zygmunt Piotrowski, by making its own 
test records available. A portion of this paper is 
based upon a doctoral dissertation submitted to Co- 
lumbia University by one of the authors. 
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ality and describes what he considers “classic” 
patterns for the diagnosis of postparoxysmal 
and interparoxysmal syndromes. The latter is 
believed to be especially valuable for patients 
undergoing anticonvulsive drug treatment, a 
situation prevalent among the epileptic sub- 
jects tested in this study. In all, 31 diagnostic 
signs or patterns were elicited from Szondi’s 
and Deri’s volumes. Together with the perti 
nent quotations and references, they have been 
listed in Tables 1 through 5, along with the 
number of epileptic and homosexual subjects 
whose records actually revealed these signs. 
The comments attributed to Szondi have been 


translated from the German. 


Results 


In order to simplify consideration of results 
obtained, each table will be considered sepa- 
rately, followed by a final summary of all 
tables. the study the 
chi square has been used in the determination 


Throughout statistic 
of significance. Yates’s correction for contin- 
uity has been applied in those cases in which 
theoretical cell frequencies are below five. 

Table l. OF eight signs postulated by Deri 
as typical of idiopathic epileptics, two (+ h, 

s and d) proved to be statistically signifi 
d), however, was 
found in fewer epileptics than homosexuals. It 
was thus significant in the unpredicted direc- 
tion. Of the remaining six epileptic signs, one 
was given by more epileptics than homosexuals, 
three were given by more homosexuals than 
epileptics, and one was given by equal numbers 
of the two groups. When Deri’s four major 
signs for epilepsy were considered as a pattern 
and the number of subjects who obtained 
from all to none of these signs was tabulated, 
the matrix proved to be significant. 


cant. One of these signs (+ 
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Table 1 


Frequency of Deri’s Postulated Signs for Epilepsy 
as Observed in 100 Epileprics and 
100 Homosexuals 








Epilep- Homo- 
Si ‘ . 
ign Comments & References len 4 pemasie 





+h, +s** “Indicative of strong 
need for motor dis- 


charge” [3, p. 81] 64 25 


“epileptic patients ap- 
proaching outbreak of 
seizure” [3, p. 141] 38 34 


+h, —s “Counterindication for 
real epilepsy: great 
motoric seizures”’ 


[3, p. 85] 15 15 
+d, —m 


“Seen in epileptics” 
[3, p. 150] 10 11 


ok, “Breaking down of ego 
op or —p .. seen in 
deteriorated epileptics” 


[3, p. 215] 9 14 


functions . 


—e, +hy 


“Epileptics near seizure” 
(3, p. 111] 3 2 


+q** “Found relatively most 


frequently in epileptics 

which may account for 

the general slowness and 

‘stickiness’ of the epi- 

leptic character” 

[3, p. 128] 2 14 


+s, any e, “Real epilepsy is asso- 
ok, —m, ciated with —s, —m, 
op or —p and a weak ego in 
addition to its associa- 
tion with a changing 
e constellation” 
[3, p. 96] 2 3 


+s, ok, —m Deri’s Major Signs for 
op or —p** Epilepsy [3, p. 96] 


All 4 Signs observed in: 2 3 
3 Signs observed in: 20 13 

2 Signs observed in: 40 26 

1 Sign observed in: 32 32 

No Sign observed in: 6 26 





**Significant at the .01 level. 


Table 2. Of five signs described by Szondi 
as the “classical postparoxysmal syndrome,” 
two (+ s and ok, op) proved to be statisti- 
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Table 2 


Frequency of Szondi’s “Classical Postparoxysmal 
Syndrome” As Observed in 100 Epileptics 
and 100 Homosexuals 








Epilep- Homo- 





Sign Comments & References ton eummale 

-+s*° “Urge to satisfy 

aggressive drives” 68 33 
oe “Release of need” 38 28 
op “Part of epilepticego” 19 26 
ok “Disintegrated ego fol- 

lowing stupor of (epi- 

leptic) attack” 17 25 
ok, op* “Epileptic ego picture” 1 8 
+-s,oe,—hy “Classical postparoxysmal 
ok, op syndrome” [4, p. 93] 


All 5 Signs observed in: 
4 Signs observed in: 5 


3 Signs observed in: 26 21 
2 Signs observed in: 39 27 
1 Sign observed in: 22 43 
No Sign observed in: 8 7 





“Significant at the .05 level. 
**Significant at the .01 level. 


cally significant. One of these signs (ok, op), 
however, was found in fewer epileptics than 
homosexuals, and was therefore significant in 
the unpredicted direction. Of the remaining 
three signs, two were found in more homosex- 
ual than epileptic records. When Szondi’s 
five signs were treated as a pattern, the matrix 
did not prove to be significant. 

Table 3. Of four signs postulated by 
Szondi as the “classical interparoxysmal synd- 
rome,” two (—s and —k) were statistically 
significant. Of the other two signs one was 
found in more homosexual than epileptic rec- 
ords and the other appeared in equal numbers 
of profiles for the two groups. The four signs, 
treated as a pattern, proved to be statistically 
significant. 

Table 4. Of four signs postulated by Deri 
as typical of homosexuality, two (oh and + k, 
+ p) were statistically significant. In one of 
these cases (oh), the observed frequency in 
homosexuals was 10, whereas by chance alone, 
as noted by Cohen [1], approximately 23 
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Table 3 


Frequency of Szondi's “Classical Interparoxysmal 
Syndrome” As Observed in 100 Epileptics 
and 100 Homosexuals 


Epilep- Homo- 





Sign Comments & References ,. ; 
tics sexuals 
-++-s** “Strong aggressive 
impulses” 68 33 
kee Part of syndrome 67 49 
—hy Part of syndrome 58 61 
—e Indicative of release 
of tension following 
drug therapy 28 28 
$s, —e, Szondi’s classical inter- 
hy k* paroxysmal syndrome 


[4, p. 94] 


All 4 Signs observed in: 6 5 
3 Signs observed in: 34 18 

2 Signs observed in: 40 30 

1 Sign observed in: 15 33 

No Sign observedin: 5 14 


**Significant at the .01 level, 








would be expected. When all four signs were 
treated as a pattern, the matrix was not sig- 
nificant. 

Table 5. Of five signs postulated by Szondi 
as typical of homosexuality, one (—s, —hy) was 
statistically significant. Of the remaining four 
signs, one was given by more homosexuals 
than epileptics, two were given by more epi- 
leptics than homosexuals, and one was given 
equaliy by the two groups. When all five signs 
were treated as a pattern, the matrix was not 
significant. 


Summary of tables. Of sixteen different? 
signs postulated by Szondi and Deri for epi- 
lepsy, five were statistically significant. In two 
cases however, frequencies were greater in the 
homosexual than in the epileptic group. These 
signs therefore significantly differentiated the 
two groups in the unpredicted direction. Of 
nine different signs for homosexuality postula- 
ted by Szondi and Deri, three were statisti- 
cally significant. When the signs were treated 

2Signs are here considered different even when 
the same factor may appear in more than one sign. 
Thus, + h is considered a single sign when so 


postulated, and + h, —s is also treated as a sin- 
gle sign when that combination is postulated. 


Table 4 


Frequency of Deri’s Postulated Signs for Homo- 
sexuality As Observed in 100 Epileptics 
and 100 Homosexuals 


Homo- Epilep- 


Sign Comments & References : 
sexuals tics 
hy “Many times only latent 
but felt as dynamically 
strong homosexual 
drives” [3, p. 105] 61 58 
+m “Frequent in homo- 
sexuals” [3, p. 138] 36 32 
oh** “Overt passive homo- 
sexuals” [ 3, p. 72] 10 1 
+k, +p* “There is usually a deep 
underlying frustration 
and a strong latent or 
open homosexuality” 
[3, p. 255] 6 0 
oh, —hy, Deri’s Pattern 
+ k, 4 P, 
+m 


All 4 Signs observed in: 0 0 


3 Signs observed in: 5§ l 
2 Signs observed in: 30 20 
1 Sign observed in: 39 48 


No Sign observed in: 26 31 








*Significant at the .05 level. 
**Significant at the .02 level. 


as patterns and the number of subjects who 
obtained from all to none of these signs was 
tabulated, two of the five matrices were 
statistically significant. 
Discussion 

The results presented here cannot be con- 
sidered an ultimate validation of the Szondi 
Test, nor can the statistical approach be 
universally recommended. Only two clinical 
groups were studied and the test was admin- 
istered but once. Moreover, no attempt was 
made to study the relationship of the individ- 
ual Szondi Test profiles to the specific clinical 
symptoms of the subjects. It is recognized that 
the potential value of projective instruments, 
such as the Szondi, can be realized fully only 
in the hands of an experienced and sensitive 
clinician. Notwithstanding these limitations, 
however, if postulated signs or patterns are to 
have any diagnostic meaning, it should be 
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Table 5 
Frequency of Szondi’s Postulated Signs for Homo- 
sexuality As Observed in 100 Homosexuals 
and 100 Epileptics 
Sign Comments & References Homo- Epilep- 
sexuals tics 
s,--hy** “Inversion of drive goal” 
(4, p. 98] 21 7 
+h, —s “Inversion of sexual 
goal” [4, p. 98] 15 15 
tp “Inversion of identifica- 
tion” [4, p. 123] 11 18 
$ hy “Passive-homosexual 
p ‘E’”’ [4, p. 98] (Shows 
predisposition and not 
necessarily open 
manifestation) 2 1 
+d, +m “Inversion of object 
choice” [4, p. 123] 0 3 
—s, —hy, Szondi’s typical syndrome 
tp, +d, of homosexuality 
-+-m [4, p. 123] 
All 5 Signs observed in: 0 0 
4 Signs observed in: 3 4 
3 Signs observed in: 16 S 
2 Signs observed in: 28 28 
1 Sign observed in: 34 41 
No Sign observed in: 19 19 





**Significant at the .01 level. 


Henry P. David and 





William Rabinowitz 


possible to demonstrate their validity in a care- 
fully controlled study. The failure of most of 
the signs to differentiate the groups in this 
study would seem to argue against the routine 
administration of the Szondi Test in clinical 
practice. The fact that two epileptic signs 
were significant in the unpredicted direction 
adds support to this conclusion. 


Summary 


The Szondi Test was individually admin- 
istered to each of 100 idiopathic epileptic and 
100 overt homosexual males. The extent to 
which signs postulated by Szondi and Deri for 
epilepsy and homosexuality actually differ- 
entiated the two groups was investigated. Of 
25 different signs, six were significant in the 
predicted direction and two were significant 
in the unpredicted direction. The implications 
and limitations of these findings were discuss- 
ed. 


Received September 19, 1951. 
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Determining “Chance Success” When A Specific 
Number of Items Are Sorted Into Discrete 
Categories 


Frank J. Dudek 


University of Nebraska 


Numerous studies in the recent literature, 
[1, 3, 4, 7, 8, 9], attest to an interest in test- 
ing various hypotheses with regard to the 
Szondi test items. Various designs have been 
utilized and many of the experiments present 
rather interesting problems in the interpreta- 
tion and statistical analysis of the collected 
data. In connection with a recent series of ex- 
perimental studies relating to the Szondi 
test, to be reported elsewhere, several of these 
interpretive problems have been encountered. 
It was felt that the solutions to these problems 
might be of general interest since they repre- 
sent a rather frequently encountered experi- 
mental situation. The experiments utilizing 
the Szondi test items are excellent illustrations 
of the experimental conditions where these 
solutions are appropriate. The 
which psychologists usually refer for statistical 
procedures do not contain analyses of this type 
of problem although they are generally con- 
sidered in texts on mathematical probability 
[6, 10]. Cohen [2], has presented the analy- 
sis and interpretation of one type of problem 
relating to hypotheses about the chance dis- 
tribution of Szondi valences. The basic ration- 
ale of the solutions is quite similar, but parti- 
cular experimental conditions often place spe- 
cific restrictions upon the data and these 
restrictions must be taken into account ir 
analyzing and interpreting the results. To 


different kinds of problems are to be consider- 
ed here. 


sources fo 


The first problem is of the type in which an 
individual must pair a given number of items 
with a like number of categories. The question 
is: How many correct pairings can be expected 
“by chance” in this situation? An example of 


251 


this kind of study is Rabin’s [8], where sub 
jects with 8 
response and with eight pictures of individ 


were provided categories of 
uals. The task was to place each picture in 
one of the categories so that each category had 
but one picture associated with it. Since there 
are 8 categories from which to choose, on the 
average one might expect Y@ of the choices to 
be correct. This is correct. The mean number 
of “correct” pairings “by chance” is one. But 
of interest also is the frequency distribution 


of “scores” that might be 


obtained ‘“‘by 
chance.” 

It is relatively easy to solve for the likeli- 
hood that all of the eight pictures will be 
orrectly placed; and the reasoning might be 
as follows: (a) The chance that the first pic- 
ture placed will be in the appropriate category 
is 4. (b) Now, if this event happens (and it 
must happen if all pictures are to be correctly 
placed) there remain only 7 categories from 
which to choose. Thus, the chance that the 
second picture will be correctly placed is 1/7. 
(c) In this event there will remain 6 categor- 
ies of choice and the chance of getting the 
third picture correct is 1/6 (etc., throughout 
the series). Thus the chance of getting all 
eight pictures in the right categories (a score 
of 8) will be: 


in Se. a ew 1 


8 ese 40,820 


ai 


$3241 8! 


Thus there is but one chance in 40,320 of 
getting a score of 8 where 8 items are sorted 
into 8 categories “by chance.”’ Another way to 
consider it is that the number of permutations 
of 8 things (different orders) is equal to 8! 
Only one of these orders has all 8 items in the 
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“correct” sequence. So, only one order out of 
40,320 orders yields a score of 8. 

It can be seen that this is an instance of 
what might be called “conditional probabil- 
ity.” The probability of the second event is 
“conditional,” i. e., it depends upon what hap- 
pened during the first event; the probability of 
the third event must take into consideration 
the first two, etc. This feature makes the prob- 
lem unique. It is in marked contrast to the 
more common case where it is considered that 
each event is independent of all others. In the 
latter situation, it is as if one had an infinite 
number of items from which to choose—or, it 
would be analogous to the situation where, 
after each “draw” or choice is made, the item 
is replaced so that it may again appear. In 
this event the likelihood of an item being 
drawn remains constant (14 in the illustra- 
tive problem) and what happened on the first, 
or any previous draw has no influence upon 
the “chances” of the second or any subsequent 
choice. 


To return to the problem of the distri- 
bution of 8 items into 8 categories it is clear 
that we do not have independent events mak- 
ing up the series of choices. Thus, we must con- 
sider every possible “condition” or order of 
cards. The total number of orders of 8 differ- 
ent items is, as indicated above, 8! or 40,320. 
One of these orders is that of 1-2-3-4-5-6-7-8, 
and this would give a score of 8. Now, it should 
be apparent that it is not possible to get a score 
of n—1, i.e., one less than the number of items 
sorted. If n—1 items are correctly paired then 
the last one must be correct. For the remaining 
possibilities a general formula has been worked 
out (see, for example, Plummer [6, pp. 20- 


21.]. The formula is: 
el 
+ ——_ | .(1 
(n—r)! 


ae i. 9 1 
renal —-—+— — 
ri{2! 3! 4! 

AX," is the chance of getting a “score” of 
exactly r successes where n different items are 
distributed into n different categories. It 
should be noted that the term in brackets is 
extended only to that point where the denom- 
inator of the last term ends the series; and 
that the last term is positive no matter what 
the sign of the previous term. An example will 
make the application of the formula clear. 


x2=— 
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Suppose that we took one set (eight pictures 
each representing one diagnostic category) of 
Szondi pictures and placed them randomly in- 
to 8 cells (representing the 8 diagnostic cate- 
gories ). What is the chance that we will find 
no pictures matching its diagnosis—i.e., zero 
successes ? Substituting in formula (1) we ob- 


tain: 
1 4 oe 1 1 (—1)8 
- am Gh ame € a -4- ae Gis ane + 
2! 3! 4! 5! 6! 7! 8! 


and this is equal to 


1 20160 6720 + 1680 — 336 + 56—8-+1 
1 40320 


14833 


40320 


| 








0! 








The chances of finding 3 successes would be: 


ire “a? - yp 
x,*=— |—-—+-—4 
$1, 2! 3! 4! 5! 


1/1 1 J ie~ a +4 
=| « - « +- — ——- or 
6} 2 6 24 120 720 
and, the likelihood of getting 5 successes 
would be: 





924 











” 40320 





2 112 
— or 
720 





40320 


The formula is general and can be applied 
to any number of items. Table 1 tabulates the 
number of successes out of the total possible 
orders for situations involving from 3 to 10 
items. In boldface are presented the values in 
terms of proportion of the total number of 
orders possible. 

Several interesting features of this table 
might be pointed out. (a) The mean of each 
distribution is one. Thus, no matter how many 
items are distributed (providing, of course, that 
there are as many categories as items distrib- 
uted) one would expect on the average but one 
“correct” pairing “by chance.” (5) With an 
odd number of items to be distributed there 
is always one more way of getting a “score” 
of one than of obtaining a score of zero. With 
an even number of items distributed there is 
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Table 1 
Frequencies and Proportions of Correct Pairings When N Items 
Are Distributed into N Categories 
Total 
No. “SCORE” no. of 
of : orders 
Items 0 l 2 3 4 5 6 7 8 9 10 (n!) 
3 2 3 0 1 6 
-3333 5000 -1667 
4 9 8 6 0 1 
-3750 3333 -2500 -0417 
5 44 45 20 10 0 1 
-3667 -3750 -1667 -0833 -0083 
6 265 264 135 40 15 0 1 
3681 -3667 -1875 0556 .0208 0014 
7 1854 1855 924 $15 70 21 0 1 
.3679 -3681 -1833 -0625 -0139 -0042 -0002 
s 14833 14832 7420 2464 630 112 2 0 l 4 
3679 -3679 -1840 0611 -0156 -0028 -0007 — -000025 
9 133496 133497 66744 22260 5544 1134 168 36 0 l 3¢ 
.3679 3679 -1839 -0613 -0153 0031 -00046 -00010 -0000028 
10 1334961 1334960 667485 222480 55650 11088 1890 240 45 0 1 36265 
.3679 -3679 -1839 -0613 0153 -0031 -00052 .000066 .000012 -00000028 


always one more way of getting a score of 
zero than a score of one. (c) Of incidental in- 
terest is the observation of the mathematicians 
that as the number of items becomes large the 
p of getting all items wrong (no successes) 


1 
approaches the quantity —, i.e. 5 


é a. 





= .3679. 
8 


From the table it is seen that this is true even 
though relatively few items are involved. 
The second problem is that of the type of 
distribution one might expect when there are 
several sets of similar items placed into a spe- 
cified number of categories (but all sets are 
intermixed in a random fashion). This, it is 
seen, is a kind of extension of the previous 
problem where we had but one set of n items 
to be distributed into a like number of cate- 
gories. This second problem might be illus- 
trated in terms of the Szondi cards in the 
following way. Suppose subjects were given 
the entire set of cards (in random order) and 
the task was to place these cards into 8 cate- 
gories—with the specification that each cate- 


cards, well shuffled, and turned them up one 
at a time. As they are turned up one calls the 
sequence of cards, i.e., A, 2, 3,.......J; 
a > Sega Re err 
etc., for all 52 cards. If a response were scored 
as “right” each time a call corresponded with 
the turning up of the card (i.e., as you called 
a “three,” a “three” actually was turned up) 
what kind of distribution of 
one obtain “‘by chance’? 
This problem is somewhat more complex 
than the first formulation for the 
“general case’ (like formula [1] for the first 
type of problem) was found in a casual survey 
of texts on probability. However, it seems 
clear that the expectancies are not similar to 
the first type of experiment, for this situation 
not represent mere replications of that 


aoes 
design. 


“scores” would 


and no 


The number of different orders possible in 
a situation like this is given by Whitworth 
[10, p. 30]. In general, the rule is that the 
number of different orders of n cards, where 


a are of one sort, 6 of another sort, . . and & 
gory must have 6 and only 6 cards associated of another sort, is given by n/ / a/ b/..... kl 
with it. What kind of distribution of “scores” Or, in the case of 48 cards (where 6 are of 
might be expected “by chance”? Or, to put one sort, 6 of another sort,..... up to 6 of 


the problem in terms of a common deck of 
playing cards: Suppose one took a deck of 


the eight different kinds) one has 48! / (6!)°, 
or about 1.365 X 10, different orders. It 
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Table 2 
Distributions of “Scores” Obtained When Sorting Several Sets of 
Items into a Specified Number of Categories 


Several sets of 3 items each distributed equally 
into three categories. 
(Italicized values are proportions.) 





2 sets of 4 items 


distributed into 
four categories 








2 sets of 3 items 3 sets of 3 items _ ~ # sets of 3 items 





2 sets ot 4 items 














“Score” Empirical Binomial Empirical Binomial Empirical Binomial Empirical Binomial 
2 17° 2 47° 2 132 3 1 |* 
-+- -+- -+- -+- 
3 3 3 3 3 3 ae 
12 1 1 
00003 000002 
11 0 24 
— 00005 
10 48 264 
0014 0005 
9 1 1 128 1760 
0006 00005 0037 0033 
8 0 18 684 7920 1 1 
— .0009 0197 0149 0004 000015 
7 27 144 1728 25344 0 24 
0161 0073 0499 0477 — 0004 
6 1 1 54 672 3936 59136 24 252 
0111 0014 0321 0341 1136 1113 0095 0038 
5 0 12 189 2016 6336 101376 64 1512 
a 0165 1125 1024 1892 1908 0254 0231 
+ 12 60 324 4032 7947 126720 246 5670 
1333 0823 1929 2048 2294 2384 0976 0865 
3 16 160 435 5376 7136 112640 456 13608 
1778 2195 2589 2731 2059 .2120 1810 2076 
2 27 240 378 4608 4536 67584 742 20412 
3000 3292 2250 2341 1309 1272 2944 3115 
1 24 192 216 2304 1824 24576 660 17496 
2667 2634 1286 1171 0526 0462 2619 2670 
0 10 64 56 512 346 4096 327 6561 
1111 0878 0333 0260 0100 0077 .1298 1001 
= 90 729 1680 19683 34650 531441 2520 65536 





would be a most tedious task to determine the 
“scores” for all possible orders. 

One possibility, somewhat intuitively 
reached, is that the binomial expansion ( + 
q)" (where p is the probability of a response 
being wrong, g is (1—p), and n is the number 
of cards or items being considered) should pro- 
vide a suitable model. However, the binomial 
assumes that the various events are independ- 
ent—and clearly we do not have an un- 
limited supply of items from which to draw, 


nor do we replace those items once drawn; 
rather, the items are “used up” in the course 
of the trials. 

To obtain information as to the type of dis- 
tribution yielded by these situations, it was 
decided to determine the actual distributions 
that would result for some smaller numbers 
of cards and compare these with the distribu- 
tions obtained by the binomial expansion. Thus, 
for 2 sets of 3 items each there are n/ / a! b/ 
cl! = 6! / 2! 2! 2! = 90 different orders for 


se a 8 





—— EMPIRICAL DISTRIBUTION 
(2 sets of 3 stems) 
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Fig. 1. Comparisons of empirical and binomial di 
items are distributed into three categories. 


the six cards where two are of one sort, two 
are of another sort, and two are of a third sort. 
It is not too difficult to ascertain the scores 
which would be obtained for these different 
orders. For 9 items comprised of 3 sets of 3 
items each there would be 9! /3! 3! 3!= 1680 
different orders; and for 12 cards consisting 
of 4 sets of 3 items each there would be 12! / 
4! 4! 4! = 34,650 different orders. The 
number of possible orders increases rapidly as 
additional “sets” of items are added. The dis- 
tributions of “scores” for the various situations 
mentioned were determined and these are pre- 
sented in Table 2. The probabilities as ob- 
tained from the appropriate binomial are also 
presented. These distributions are compared 
graphically in Figure 1. 

These data show that as the number of 
“sets” of items involved (and the total number 
of cards) increases the binomial seems to ap- 
proximate the empirical probabilities more and 
more closely. With only 6 items (2 sets of 3 
items each) the approximation is not too close; 





1. ugha bers ae 


Rugea 6 
vk BINOMIAL B+ 4) 
ra . 2 
.30 PY Q \ 
f ‘ —— EMPIRICAL DISTRIBUTION 
; (3 sets of 3 items) 
\ Asoo BINOMIAL [¢ ail ° 
, \’ a’ 3 
25 f xXx \S 
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fi 
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os /' 4 
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stributions when two, three, or four sets of three 


but if one adds another set of 3 items (for a 
total of 9 items), the discrepancy becomes 
smaller; and with no more than 12 cards (4 
sets of 3 cards each) there is comparatively 
little discrepancy between the two types of dis- 
tributions. Thus the inference seems war- 
ranted that as the number of items (and num- 
ber of sets) increases the binomial expansion 
provides a fairly close approximation to the 
expected distribution. 

The two kinds of problems, or experimental 
designs, considered here are types which might 
frequently be utilized to advantage. Experi- 
ments on the Szondi test in which the various 
cards are sorted or classified into categories 
exemplify one area of experimentation where 
this type of analysis would be especially ap- 
propriate. A much more adequate interpreta- 
tion of data is possible when one is able to con- 
sider not only the mean number of successes 
by chance, but also the total distribution which 
might be expected by chance. However, this 

type of analysis is suitable in all instances in 
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which a specific number of items are sorted in- 
to discrete categories. 


Received December 17, 1951. 
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Normative Data Obtained in the House-to-House 
Administration of a Psychosomatic Inventory’ 


Robert L. Thorndike, Elizabeth Hagen 


Teachers College, Columbia University 


and Raymond A. Kemper 


University of Louisville 


In one of a series of surveys being carried 
out under a contract with the Human Re- 
sources Research Center of the Air Forces Air 
Training Command, a brief inventory dealing 


1This research was supported in part by the 
United States Air Force under Contract AF 
33(038)-13474, monitored by Headquarters, Hu- 


man Resources Research Center. Permission is 
granted for reproduction, publication, use, and dis- 
posal in whole or in part by or for the United 
States Government. 


with psychosomatic symptoms was administered 
to a sample of 1005 adult males in Louisville, 
Kentucky. The over-all purpose of the Air 
Force project was to explore methodological 
problems in testing representative samples of 
the adult population. The purpose of this 
particular survey was to determine how well 
people would cooperate in responding to the 
type of item which is included in the typical 
personality inventory when approached direct- 


Table 1 


Item 





eee oe a aes 
suffer from frequent headaches? 
often have spells of dizziness? . 
usually feel fresh and rested in 
morning? 


1. Is your 

2. Do you suffer from frequent headaches? -...... 

3. Do you 

4. Do you 
the morning? .~.............. initiaibecistineineninasiioagiician 

5. Are you bothered by an upset stomach? -......... 

6. Do you often have difficulty in breathing? 

7. Does your hip or back bother you? 

8. Do you get tired easily? -.............. 

9. Do you frequently feel faint? -... = 

0. Do you usually sleep well at night? -..... 

1. Do you have spells of feeling hot or cold? . 

2. Are you often bothered by thumping of 


14. Do you frequently get attacks of nausea? .......... 
15. Does the sight of blood upset you? -............ 
16. Do you sometimes have nightmares? —.............. 
17, Can you usually relax easily? -.........................-... 
18. Are you easily upset or irritated? —........ 

19. Were you ever troubled by stammering? . 

20. Have you had to go to many doctors 

fo , a ee 

. Are you considered a nervous person? ................ 
22. Do you generally feel well and happy? -.......... 





~ White A and B 








Per cent Giving “Maladjusted” Response for Each Socioeconomic and Methods Subgroup 


White C and D Negro 
~Inter- Secret Inter- Secret Inter Secret 
view Ballot view Ballot view Ballot 
1.5 1.0 3.2 2.2 6.2 2.4 
6.9 6.9 15.2 10.0 11.2 14.6 
3.0 3.0 5.5 5.8 13.8 3 
13.9 16.3 18.9 22.4 23.8 1Y 

9.9 11.4 12.0 14.8 12.5 15.9 
2.0 6.9 9.2 94 10.0 6.1 
18.8 13.9 18.0 20.6 20.0 25.6 
10 20.8 15.2 22.9 16.2 15.9 
1.0 1.0 2.8 1.8 3.8 4.9 
7.9 7.9 8.3 9.0 7.5 11.0 
3.5 10 7.4 4 10.0 8.5 
3.0 6.9 7.8 13.4 8.8 } 
4.9 13.4 9.7 15.7 10.0 22.0 
0.5 4.5 6.9 ye 8.8 4.6 
4 11.9 9.2 12.5 8.8 3.7 
12.9 10.9 16.6 17.0 33.8 29.3 
19.8 21.3 14.3 18.4 8.8 9.8 
18.8 28.8 32.3 28.7 22.5 24.4 
5.4 6.0 5.5 6.3 10.0 7.3 
10.4 7. 15.2 8.5 12.5 11.0 
16.8 21.8 27.6 27.3 26.2 28.0 
4.0 7.4 4.1 8.1 12.5 6.1 
205 200 220 220 80 80 
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ly by an interviewer in their own residence 
unit. A subsidiary purpose was to compare re- 
sponses made orally to an interviewer with re 
sponses recorded on a “secret ballot’? which 
was marked by the respondent and then placed 
in a sealed “ballot box’’ which the interviewer 
carried. ‘he objective of the present article 
is to make generally available the normative 
data resulting from this survey. 

‘The instrument used in the present study, 
which was called a “health survey,” consisted 
of 22 items chosen because they all seemed 
plausibly to deal with physical health and be- 
cause they had been found during World War 
II to differentiate sharply between combat 
fatigue and control cases. The actual questions 
will be found in Table 1. When presented by 
interview, each question was read to the re- 
spondent and the response recorded by the in- 
terviewer. In the secret ballot the questions 


Robert L. Thorndike, Elizabeth P. 


Hagen, and Raymond A. Kemper 


were presented in the following format: 


Is your appetite good? 


Yes No . Or other comments 


Qualified “yes” and “no” answers were treated 
simply as “yes” and “no” responses. When 
a comment was made on the ballot, it 
was treated as an omission. Similarly, some 
responses recorded by the interviewers which 
seemed too ambiguous to classify as either “yes” 
or ‘‘no”’ were treated as omissions. Considering 
the total group of responses, approximately 3 
per cent were qualified responses and 2.5 per 
per cent omissions. 


only 


‘The survey was introduced to the respondent 
as a government project pertaining to health, 
designed to “find out what ailments are com- 
mon here in Louisville.” The choice of Louis- 
ville, Kentucky, as a location was in part based 
on the availability of a field survey organiza- 


Table 2 


Per Cent Giving 


“Maladjusted” Response 


Groups and Socioeconomic 


Combined Methods Groups 


for Combined Methods 
Groups 


Combined Socio- 
economic Groups 














Item White White Inter- Secret Grand 
A&B C&D Negro view Ballot Total 
1. Is your appetite good? soockimmeid ; 1.2 2.7 4.3 3.0 1.8 2.4 
2. Do you suffer from frequent headaches? - 6.9" 12.6* 12.9 11.1 9.8 10.4 
3. Do you often have spells of dizziness? - 5.6 10.6 5.7 5.0 5.3 
4. Do you usually feel fresh and rested in 
| EE a ee 14.8% 20.6% 21.6 17.4 20.0 18.7 
5. Are you bothered by an upset stomach? 0.0... 10. 13.4 14.2 11.1 13.8 12.5 
6. Do you often have difficulty in breathing? -...... 4.4* 9.3* 8.0 6.5 8.0 7.3 
7. Does your hip or back bother you? 00000000. 16.4 19.3 22.8 18.4 19.0 18.7 
a 1 19.0 16.0 13.2* 21.2* 17.2 
9. Do you frequently feel faint? -...........................-.. 1.0 2.3 4.4 2.2 2.0 2.1 
10. Do you usually sleep well at night? —.......... ; 7.9 8.6 9.2 7.9 9.0 8.5 
11. Do you have spells of feeling hot or cold? .. 3.8 6.4 9.2 6.1 5.4 5.8 
12. Are you often bothered by thumping of 
a ae ee . 5.0*  10.6* 6.8 5.9* 9.6* 7.8 
13. Have you frequently suffered from constipation ? 9.2 12.7 16.0 77° 16.0* 11.9 
14. Do you frequently get attacks of nausea? ~......... 2.5 7.0° 11.7 4.6 7.4 6.0 
15. Does the sight of blood upset you? -.................. 8.6 10.8 6.2 7.5 11.0 9.3 
16. Do you sometimes have nightmares? -.................. 11.9* 16.8%  31.6* 17.6 16.8 17.2 
27, Clem FO WOUTly CON COMUNE nace es 20.6 16.4* 9.3* 15.3 18.4 16.9 
18. Are you easily upset or irritated? —.............. 3.8% 30.5% 23.4 24.8 28.4 26.6 
19. Were you ever troubled by stammering? -....... 5.7 5.9 8.6 6.1 6.6 6.4 
20. Have you had to go to many doctors 
for treatment? slalalipascinientitiineieneninaceat 8.9 11.8 11.8 12.7° 8.6* 10.6 
21. Are you considered a nervous person? ................ 19.3 274° 27.1 22.8 26.0 24.4 
22. Do you generally feel well and happy? -............ 5.7 6.1 9.3 5.4 7.6 6.5 
fp a ee a aN Tn 405 440 16¢ 505 500 1005 





*Significant at .06 level. 
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tion in that community. However, in terms of 
location, general educational level, and the 
like, Louisville is believed to be as good a 
choice as almost any single city for represent- 
ing urban America. 

As indicated above, the questionnaire was 
presented in a direct door-to-door survey. The 
city was roughly classified into socioeconomic 
categories on the basis of neighborhood reputa- 
tion and prevailing house type. Quotas were 
allocated to each category on the basis of prev- 
alence in this community. From among all 
those streets falling within a socioeconomic 
category, certain streets were selected at ran- 
dom. Interviews carried out at each 
house on the street at which a respondent of 
the appropriate age and sex could be located 
until the quota for that street was filled. No 
callbacks were made at houses where nobody 
was found at home. The questionnaire was 
administered only to males judged to be be- 
tween 18 and 45 who were at home at the 
time the call was made — sometime on March 
10 or 11, 1951 — but age was checked by 
questioning the respondent. At half the houses 
in each socioeconomic group, the oral interview 
procedure was used. At the other half, the 


were 


Table 3 


Frequency Distribution of “Maladjusted” Responses 


Frequency 





Number of Secret 
Responses Interview Ballot 
0 156 122 
1 97 97 
2 85 80 
3 59 48 
4 25 46 
5 25 35 
6 12 23 
7 16 21 
8 13 7 
y 7 6 
10 5 7 
11 - 1 
12 1 5 
13 - 1 
14 2 - 
15 1 - 
16 - - 
17 1 1 
M 2.26 2.65 
5 2.66 2.73 
N 505 500 





secret ballot was used. 

The final classification of socioeconomic level 
of respondents was based upon the three factors 
of neighborhood reputation, character of the 
specific house, and occupation of the respond- 
ent. These were weighted and combined ac- 
cording to the procedure developed by Warn- 
er.“ The quotas by socioeconomic level were: 
white A (upper), 10 per cent; white B (upper 
middle), 30 per cent; white C (lower-middle), 
40 per cent; white D (lower), 5 per cent; and 
Negro, 15 per cent. 

Two types of results are presented in this 
report. In Tables 1 and 2 are reported the per- 
centages of persons giving the presumably 
“maladjusted” response to each item. Data are 
presented for separate socioeconomic and meth- 
ods subgroups in Table 1, while in Table 2 the 
results are shown for various combinations of 
subgroups. Tests for significance of differences 
were made for certain of the combined groups 
in Table 2, as follows: 

a. Socioeconomic A and B white vs. socio- 
economic C and D white, 

b. Negro vs. socioeconomic C and D white, 
and 

c. Interview vs. secret-ballot for all socio- 
economic and racial groups combined. 
Differences significant at the .05 level are in- 


Table 4 
Frequency Distribution of Omitted 
or Ambiguous Items 


Number of Secret 
Items Interview Ballot 
0 388 414 
1 66 43 
2 17 23 
3 12 16 
4 11 2 
Ss 6 % 
6 4 ~ 
7 1 - 
8 =~ = 
7) ~ - 
10 - 1 
M 0.48 0.31 
5 1.13 0.82 
N 505 500 





2Warner, W. L., Meeker, M., & Eells, K. Social 
class in America, a manual of procedure for the 
measurement of social status. Chicago: Science Re- 
search Associates, 1949. 
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Table 5 
Mean Number of “Maladjusted” Responses for Subgroups by 


Subgroups Socioeconomic Level 
A Upper class—White..... 
B Upper-middle—White 
ts Lower-middle—W hite 
D Lower class—W hite 
FIND cccsnittipptcnssennctinigisntniamnciiggtameetion 
Age 
18-25 
26-39 


40 and over..... 


dicated by asterisks in the table. The signifi- 
cant differences are not numerous, but some of 
them suggest interesting group and methodo- 
logical differences. Some indication of the 
stability of the differences can be obtained 
from an examination of Table 1. 

Tables 1 and 2 provide research workers 
with data on the single items. Thus, if they 
wish to use one or more of these items with 
other groups, they can see how their results 
compare with this cross-section sample of 
adult males in Louisville. 

Table 3 provides frequency distributions 
for the number of “maladjusted’’ responses 
given to the complete set of 22 items. This 
provides normative material for any person 
who wishes to use the intact set of items with 
some other groups. Table 4 supplements this 
material with frequency distributions for 
omissions and ambiguous responses. 

Table 5 shows the mean and standard de- 


Age and Socioec 


onomic Level 


Secret Ballot 


Interview 


N X SD N X SD 
52 1.52 1.89 52 2.10 2,43 
153 1.90 2.25 148 2.34 246 
200 «62.53 2.82 200 2.85 2.80 
20 2.25 2.24 20 3.30 3.02 
80 2.75 3.26 80 2.94 2,95 
94 2.09 2.29 85 1.98 2.00 
255 2.23 2.78 241 2.70 2.93 
156 242 2.67 174 2.76 


2.91 
viation of the number of “maladjusted” re- 
sponses for segments of the methods groups, 
fractionated in one case by age and in the 
other by socioeconomic level. An attempt was 
made to adapt analysis of variance procedures 
to testing the significance of the differences as- 
sociated with age, with socioeconomic status, 
and with method.’ This test failed to show 
any of the sources of variance to be significant 
at the .05 level, though the three main effects 
approached this significance level. ‘The test 
may not have been a very powerful one, be- 
cause of the necessity of rejecting part of the 
data. Within the limits of this test, however, 
we may still accept the hypothesis of no 
difference between the subgroups. 
Received November 21, 1951. 

SOwing to the extreme skewness of the distribu- 
tions and to the unequal cell frequencies, certain 
adaptations had to be made. These included a 


square-root transformation of the score scale and 
the dropping of excess cases in certain cells. 
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The Influence of a Superficial Immediately Preced- 
ing “Set” upon Responses to the Rorschach 


Ralph D. Norman, Shephard Liverant, and Miriam Redlo 


The University of New Mexico 


The Rorschach is based upon the inter- 
action of three fundamental psychological con- 
cepts: set, projection, and perception. Op- 
erationally, although definitions vary widely, 
according to Gibson [10], set may be defined 
as a state of readiness which causes an organ- 
ism to respond to a stimulus in a potentially 
prescribed manner. Under the heading of sets 
may be included needs, drives, motives, etc., as 
determining tendencies. Projection is too well- 
known to discuss here. The fact that percep- 
tion is based upon already existing underlying 
needs has been amply demonstrated [6, 15, 16, 
17]. These assumptions are inherent in the 
rationale of projective devices, especially of the 
Rorschach, for the fundamental needs, drives, 
wishes, etc. are contained within the individ- 
ual as pervasive sets which are projected out- 
wards. Thus, if the Rorschach is to be con- 
sidered as a valid diagnostic clinical instru- 
ment, it is the underlying dynamic tendencies 
(or sets) which must determine the responses 
of the subject, not superficial conscious “sets’ 
due to any immediately preceding experience. 

In reviewing the literature, one is struck by 
the paucity of material concerning the experi- 
mental study of the role of set with the Ror- 
schach. A number of experiments have been 
reported in which the set of the subject is con- 
trolled either by the instructions of the experi- 
menter or by changing the affective tone 
of the testing situation. Fosberg [9] found 
that instructions to give best or worst im- 
pressions had no significant influence on the 
basic Rorschach picture. Levine, Grassi, and 
Gerson [12, 13] found marked changes in the 
Rorschach of a single subject, corresponding 
to hypnotically induced moods. Bergman, 


1The authors are indebted to Dr. Anne Roe for 
her valuable suggestions and criticisms. 
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Graham, and Leavitt [5] and Wilkins and 
Adams [18] also found significant deviations 
in the test correlating with suggestions given 
under drugs and hypnosis. Hutt et al. [11] 
report that Rorschach altered 
when subjects were given specific instructions 
to look for important variables. Lord [14] 
found that negative and positive rapport con- 
ditions affect test performance, as do different 
examiners. Abramson [1 ] 


scores were 


experimentally 
proved that Rorschach area responses may be 
significantly altered by prestige suggestion. 

In all the above studies, except Fosberg’s 
significant variations in Rorschach protocols, 
were produced by experimentally induced sets. 
Yet in all these 


directly related to the testing situation; it had 


studies, the set had been 
been forced upon the subject in such a manner 
that he could not disregard the set and re 
spond in a way that was more compatible 
with the basic rationale behind the test. Such 
a procedure creates a setting that can and 
should be controlled in the true testing situ 
ation. Bellak [4] points out that projection 
will vary in amount inversely with the clear- 
ness of the stimulus and the exactness of in- 
structions. He adds further that the amount 
of adaptive behavior will vary conversely with 
the degree of exactness of definition of the 
stimulus, and will also depend on the set or 
“Aufgabe.” In Bellak’s 
ations, the foregoing studies might be said to 
be measuring the subject’s adaptive behavior, 
rather than true projection. Moreover, the 
above experiments may also be considered in 
light of Gibson’s conclusions from his critical 
review: 


view of consider 


A number of common assumptions about mental 
set sometimes used in attempts to define the con- 
cept are seen to be false. It cannot be defined as 
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established by, or dependent upon, verbal instruc- 
tions, since equivalent results can be obtained with 
procedures which substitute for instructions train- 
ing, or a regular sequence of events. It 
even be assumed to be aroused by self-instructions, 


cannot 


since it may be unverbalized and may even be un 
conscious. It cannot characterized as 
temporary, since it may outlast the task which oc- 
casioned it. It cannot always be defined either in 
terms of a predisposition for reaction or for per- 
ception, since these are semi-independent [10, p. 


811]. 


always be 


The above comments and quotation are not 
meant to detract in any way from the value 
of the studies cited above, but rather to con- 
trast them with, and also to emphasize the 
value of, the present investigation. Our study 
seeks to inquire into the influence of an imme- 
diately preceding, perhaps superficial, non- 
emotional “‘set’’ which is apparently complete- 
ly dissociated from the testing situation itself. 


Statement of the Problem 


As a validation of the Rorschach rationale 
the following hypothesis was subjected to ex- 
perimental test: An immediately preceding su- 
perficial “set” will not markedly alter a sub- 
ject’s responses on the Rorschach when the 
latter is given immediately after “‘set,’’ as com- 
pared to his ordinary responses on the test. 
Since the validity of the test is based upon 
responses being determined by deeper, under- 
lying sets and not by immediate, superficial 
ones, experimental refutation of the preceding 
hypothesis would seriously threaten the valid- 
ity of the Rorschach. Contrariwise, verifi- 
cation of the hypothesis reinforces the validity 
of the test. It is of interest that Coleman [7 ] 
used a very similar hypothesis in attempting 
to check the extent to which immediatly pre- 
vious experiences were reflected in the TAT. 
He points out that if the test’s clinical use is 
to provide a measure of the patient’s basic 
personality, the TAT should not be subject to 
day-by-day fluctuation. By using a 15-minute 
movie as the superficial “‘set,” he found that 
the TAT was not significantly affected in 
children. Would the same be true of the Ror- 
schach? 

To keep the present study within as rigid 
experimental controls as possible, the “sets” 
given were highly specific. Therefore, if sig- 
nificant variations were found, it could be 





Ralph D. Norman, Shephard Liverant, and Miriam Redlo 


set” and 


shown that they were due to the 
little else. ‘Two types of “set” were used, and 
to futher the validity of the hypothesis, they 
were completely dissociated in the minds of the 
subjects from the testing situation itself. One 
of the sets may be termed a “food set,” used 
because it was believed that the appearance 
of food responses to any extent in the protocols 
would be very significant, since food is not 
commonly seen by normal subjects. From the 
opposite point of view, the second “set,”’ which 
portrayed human movement, was used inas- 
much as it is considered a very important and 
common Rorschach index, and we wished to 
determine if it too could be markedly influenc- 
ed by an immediately preceding experience. 


Subjects and Procedure 


Twenty Rorschach-naive subjects, 13 fe- 
males and 7 males, were selected from a course 
in general psychology, and were arbitrarily 
divided into four groups depending upon their 
order of appearence for previously made ap- 
pointments as follows: 


Al: The “food set” first, immediately followed 
by the Rorschach and a week later by the Ror- 
schach alone. 

1A: The Rorschach alone and a week later the 
“food set” immediately followed by the Rorschach. 

B2: The “movement set” first, immediately fol- 
lowed by the Rorschach and a week later the Ror- 
schach alone. 


2B: The Rorschach alone and a week later the 
“movement set” immediately followed by the Ror- 
schach. 


The main feature of this kind of design is that 
it controls the order of presentation leaving 
the “set” as the influencing variable to be ob- 
served. 

The actual testing was done by the two 
junior authors, each of whom worked with 10 
subjects. The factor of separate administrators 
was controlled by having the entire procedure 
administered by the same person to the same 
subject. Subjects appeared at the same hour 
for the second test as they did for the first. 
Time of day was not controlled in the “food 
set,” since none of the subjects had fasted 
more than 3 hours before taking the Ror- 
schach. Atkinson and McClelland [2] found 
that a 4-hour fast had no influence on TAT 


responses. 











The Influence of Superficial “Set” on the Rorschach 


“Set’”’ A consisted of 40 full-page, brightly 
colored advertisements pertaining to food of 
all kinds. “Set” B consisted of magazine ads 
portraying M], i. e., people in action. Half of 
the latter were chromatic. In order to dissociate 
the set from the following Rorschach, each 8 
in groups Al and B2, upon being presented 
the advertisements in booklet form, was in- 
structed as follows: “I am running an experi- 
ment in order to find out what types of adver- 
tising are most appealing or effective for you. 
Look carefully at each of the following sam- 
ples of well-known ads and indicate your re- 
action by merely stating that you like or dis- 
like it.” 
actions were noted on a specially prepared 
form. 


For further impressiveness the S’s re- 


After this “set” procedure, the S was told 
casually, ““While you are here, I’d like your 
cooperation in another experiment that is be- 
ing run.”” Then he was given the Rorschach 
according to Beck’s procedure [3]. For groups 
1A and 2B which were given the Rorschach 
alone first, an identical procedure was used 
with sufficient modification in the instructions 
to make the dissociation between the “set” and 
the Rorschach plausible. Average length of 
time for looking at the advertisements was 
about 20 minutes, with a range from 15 to 25 
minutes. 

Before being given the Rorschach the second 
time each § in both Groups 1 and 2 was in- 
structed that this was not a test in memory 
and that no attempt should be made to remem- 
ber what was seen the first time. Later inter- 
view checks revealed that in no case had the § 
associated the “set” with the testing situation. 
To save time and not unnecessarily to compli- 
cate the study, only the free association part 
of the Rorschach was given each time. This 
shortening of procedure may be the most 
serious limitation of the study. 

After all the protocols were accumulated, 
they were identified and scored without refer- 
ence to type of “set” or order of presentation. 
Frequency of response was counted for only 
those categories which it was felt could be di- 
rectly influenced by the “set.” 


Results 


Table 1 presents the mean results of the 
set” and Rorschach-alone groups in selected 
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categories, as well as results of tests for sig- 
nificance with ¢. (The latter statistic was cal- 
culated with the formula for correlated 
groups.) Table 1 fails to indicate any really 


‘Table 1 
Mean Results of Set—Rorschach Rorschach 
Alone Groups in Selected Categories 


and 


4 ~ ~ ~ aad 
Gps. (A1-I) + (1A-II) 29.4 3.1 62 9.2 O 
(Food Set-Rorschach) 
Gps. (A1-II) + (1A-I) 35.0 6.1 3.1 94 0.2 
(Rorschach-Alone) 
Difference 56 3.0 3.1 0.2 0.5 
t eee ee ee fe 
: s 
a e 3 
~ t., oa — ae 
Gps. (B2-I) +- (2B-II) 28.9 14.3 11.0 25.1 18.1 
(Mvt. Set-Rorschach ) 
Gps. (B2-II) + (2B-1) 34.4 16.0 9.3 25.7 15.4 
(Rorschach-Alone) 
Difference aS B27 if GS Be 


t 2.5% 0.6 0.6 0.1 1 


*Significant at the 5 per cent level. 
significant difference between the “set” situa- 
tion and the Rorschach-alone situation.? The 
only possibly significant difference is between 
number of R in the “movement-set”’ situation, 
but the ¢ is only slightly better than the 5 per 
cent level. The reduced number of R in both 
“set” situations might just as well have been 
a product of fatigue perhaps as of “‘set.”” The 
categories expected to show a significant 
difference following the “food set” were those 
of food, C and CF. Since each picture was 


2The authors are aware of Cronbach’s injunction 
[8] against use of percentages in Rorschach statis- 
tical work. Also the use of percentages with the t 
statistic may be questioned. (A simple analysis of 
variance also gave negative results.) Ratio scores 
were used because our R’s were comparatively 
large, a situation which Cronbach intimates is safer 
when percentages are employed. 
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vividly colored, C and CF might be expected 
to have been influenced to some degree. Simi- 
larly, the categories of M, H, and perhaps in- 
directly Fm, might be expected to have been 
influenced in the “movement-set” situation. 
Such expectations were not fulfilled, and thus 
our basic hypothesis is supported. 

Aside from the criticism of this study that 
the free association alone was used, a serious 
objection might be raised that the intended 
“set” was so superfically presented that per- 
haps it was not set at all. But if instructions 
were given to look for food or movement per- 
cepts in the cards, we would have undoubtedly 
found them in the “set” situation, as previous 
experimentation has shown. Such was not our 
purpose at all. The crux of the matter lies in 
the definition of “‘set,”” and we tried to avoid 
a situation corresponding to Bellak’s point 
that there are those “sets” which create per- 
haps more adaptivity than others. 


Summary 


An experimental attempt was made to veri- 
fy the hypothesis that an immediately preced- 
ing superficial “set” will not markedly influ- 
ence the number and kind of responses to the 
Rorschach. Two such “sets” were evoked, 
using magazine advertisements as stimuli. One 
was a “food set”; the other a “movement set.” 
With 20 subjects, it was found that responses 
which might be expected to alter because of 
the nature of the “set” remained stable when 
compared to responses in a normal situation, 
thus verifying the basic hypothesis. 


Received November 9, 1951. 
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The Retest Reliability of the Group Rorschach 
and Some Relationships to the MMPI’ 


Richard Blanton and Ted Landsman 


Vanderbilt University 


Since the publication of Munroe’s mono- 
graph [9] interest in the group-administered 
Rorschach and in checklist scoring has grown 
considerably. The screening of large groups 
by means of an easily administered and scored 
projective device offers attractive possibilities 
for psychologists in personnel and counseling 
work, as well as in clinical fields. This inter- 
est raises important issues concerning the reli- 
ability of the method and its relationships to 
other popular screening devices. 

Munroe has found the interjudge relia- 
bility of ratings of abnormality from her check 
list to be acceptable. Since these ratings re- 
quire the subjective weighting of the pattern 
of responses, it is likely that interjudge relia- 
bility in scoring by the checklist is even higher 
than her obtained p of .65. Of particular in- 
terest to measurement psychologists is her as- 
sertion that the numerical sum of signs is of 
considerable validity in identifying malad- 
justed subjects, though it is well to bear in 
mind that she does not consider this sum as 
valid as her adjustment-rating method. The 
sum of signs, however, offers advantages for 
statistical manipulation. There are no serious 
statistical problems involved, and the scores 
can be examined by refined methods for sta- 
bility. 

Questions concerning the form of the dis- 
tribution of these scores and the possibility of 
curvilinear regression on other measures have 
been raised, notably by Cronbach [4]. There 
is, as well, the question of the nature of any 
stability which they might demonstrate. The 


1Read at the annual meeting of the American 
Psychological Association before the Division of 
Counseling and Guidance, Chicago, Illinois, Sep- 
tember, 1951. The authors wish to acknowledge 
with gratitude the aid of Dr. Daniel Broida, Miss 
Vivian Thackaberry, and Mr. Norman Harway. 
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following study was designed to answer the 
first of these questions, that of the form of the 
distribution of the scores and their stability. 
Relationships shown by some aspects of the 
Inspection Rorschach to items on some of the 
MMPI scales have been explored by other in- 
vestigators [1, 2, 9]. In this study we wished 
to make general exploration of these relation- 
ships. 


Procedure. The Harrower-Erickson group- 
administered form of the Rorschach and the 
abbreviated group form of the MMPI were 
given to 126 third-year college students, of 
whom 41 were women and 85 were men, in 
connection with a study of teaching methods 
[7]. After three months both tests were read 
absent from 
the testing periods. A total of 100 subjects 
took all four tests, 105 took both Rorschach 
tests, and 119 both MMPI tests. After each 
testing the Rorschach protocols were scored 
by Munroe’s Inspection Method and the Re- 
vised Checklist and the sum of signs was tall- 
ied. Three persons did the scoring, each taking 
one-third of the protocols for both ‘Test I and 
Test II. The MMPI scales were weighted 
with K and profiles were constructed for each 


subject. 


ministered. Some students were 


Self-correlations for the Rorschach and for 
all MMPI scales were computed for both sets 
of tests. Pearson r was also computed for each 
Rorschach distribution with all corresponding 
MMPI scales. The MMPI profiles were then 
sorted into deviant and nondeviant groups by 
means of Meehl’s criteria [8]. The number 
of deviant profiles obtained was so small that 
the criteria were modified. The cutting score 
for L was lowered from 60 to 55 for those 
profiles showing a high point of not less than 
70, exclusive of Mf. This yielded 20 and 21 
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profiles respectively for the two distributions. 
An arbitrary cutting score of 12 was estab- 
lished for the Rorschach distributions, which 
yielded 23 and 24 scores for Tests I and II 
respectively. Chi square was computed for 
both sets of dichotomized scores. Yates’s cor- 
rection was applied in both cases. Biserial r 
was then computed for both sets of Rorschach 
scores with both sets of dichotomized MMPI 
profiles. MMPI scores for all subjects obtain- 
ing a deviant profile on either test were aver- 
aged, and an average profile constructed for 
each. These were then evaluated by our cri- 
teria. Both sets of Rorschach scores were aver- 
aged, and biserial r computed for the composite. 


Table 1 
Means, Standard Deviations, and Self-Correlations 
of Rorschach and MMPI Distributions 


M o Self-r 

R 8.9 4.5 

.66 
R 8.1 4.1 
Hs, 51.5 7.6 

48 
Hs 51.1 6 4 
D, 50.0 7.5 

.66 
D 47.6 8.8 
Hy, 55.6 7.8 

-57 
Hy $7.1 6.4 
Pd, $3.7 8.2 

.63 
Pd, 55.4 8.4 
Mf, 56.8 10.7 

77 
Mf, 57.2 10.4 
Pa, 52.7 6.9 

59 
Pa, 53.1 6.6 
Pt, 53.1 ta 

59 
Pt, 52.4 6.7 
Se, 53.3 7.9 

57 
Se, 54.8 74 
Ma, 56.1 8.7 

.63 
Ma, 57.0 8.0 





Results. Means, o’s and _ self-correlations 
for both tests are shown in Table 1. The dis- 
tributions of the Rorschach scores closely ap- 
proximated normality. No significant skewness 
or kurtosis appeared in either case. 

The distributions of the scores on the MMPI 
scales showed a lack of variability in some 
cases. The standard deviation for a set of T 
scores should be 10. F tests comparing our ob- 
tained variances to a hypothetical variance of 
100 show that our population differs from the 
parent population of the test in 15 out of 18 
instances with significance at below the .02 
level. Our scores should not then be expected 
to yield high self-correlations, and this proves 
to be the case. Self-r’s range from .47 for Hs, 
the scale with the smallest variability, to .77 
for Mf, the scale with the largest variability. 
Correction for restriction of range? yields coef- 
ficients quite similar to those reported by other 
investigators [5, 6]. 

Correlations between Rorschach scores and 
MMPI scales are positive in all but 2 cases; 
however, they are uniformly low (see Table 
2). Only one was statistically significant, that 
with the D scale of Test 1, but this relation- 
ship did not recur on Test II. Since 18 r’s 
were computed, it is likely that this coefficient 
of .23 is due to chance. 

Chi square for the categorized Rorschach 
scores and MMPI profiles for ‘Test I was 1.07, 
p greater than .30; for Test II, chi square 
was 4.67, p less than .05. Biserial r for Test I 
was .16, for Test II .22, for the composite 
.12. The coefficient of .22 is significant at the 
.O5 level, the others are not significant. 

Discussion. The findings of this study with 
respect to reliability of both the MMPI and 


Table 2 
Correlations of the Rorschach Check-List Scores of 
100 Subjects with MMPI Subtest Scores, 
Retested After Three Months 
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Testi .01 .23 .13 .10 .08 .11 09 06 -.01 
Test2 .08 .01 .13 .01 .13 .11 -—08 .02 «12 





2Since no good formula is available for estimat- 
ing reliability when both distributions are trun- 
cated, that for one truncated distribution was used. 
Hence, the estimates cannot be said to be even ap- 
proximate, and are not given here. 














Retest Reliability of Group Rorschach and MMPI 


the Rorschach must be regarded with caution. 
The logical problems involved in estimating 
the stability of Rorschach factors are serious 
since individual performance has, in a sense, 
concept-formation and problem-solving aspects. 
A retest correlation for such a test may not 
show to what extent stability of performance 
is a function of stable traits. This is because 
the subject may be changed by the test itself. 
In fact, actual changes in personality may 
fail to be revealed by a retest. This effect may 
be somewhat mitigated by a long interval be- 
tween tests, and our finding that considerable 
variabilty in individual performance may be 
expected after three months is reassuring. The 
lack of reliability of the MMPI scale shown 
here is a function of the homogeneity of the 
population and the long interval between tests. 
It should be remarked that this lack of vari- 
ability constitutes a disadvantage in the use of 
the MMPI with college populations in addi- 
tion to other disadvantages reported in the lit- 
erature® [3]. The Rorschach, on the other 
hand, is probably increasingly variable with 
increasing mental ability and hence may show 
maximum variability when used with college 
groups. This fact, together with the findings 
of a satisfactory form of distribution and some 
stability of indices may be considered an ad- 
vantage in the use of this test for screening at 
the college level. 

Munroe’s opinion that 10 may be taken as 
a cutting score should not appear justified un- 
less more than 40% of college students are 
significantly maladjusted, a notion that some 
counselors might not find acceptable. In any 
event, evaluation of the check lists by experi- 
enced judges should be made, and for this pur- 
pose 10 might be a legitimate cutting score for 
reducing false negatives. 

Summary and Conclusions. The Harrower- 
Erickson group-administered form of the Ror- 
schach and the abbreviated group form of the 
MMPI were given to 126 third-year college 
students. After 3 months both tests were re- 
administered. Conclusions may be summarized 
as follows: 

1. The distribution of the sum of signs of 
the Group Rorschach is approximately normal 

*We are informed that college norms are now 


being developed at the University of Minnesota 
for the MMPI. 
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and the self-correlation of the test suggests 
that have stability, 
though the meaning attached to this stability 
is in doubt. 

2. The distributions of scores on the MMPI 
scales show considerable lack of variability in 
some cases suggesting that college populations 


responses considerable 


are quite homogeneous with respect to this 
test. Self-correlations of the scales, corrected 
for truncation, are similar to those reported in 
the literature. 

3. Our results show that the two tests prob 
ably hold some variance in common, but this 
communality is not large and we feel that they 
have different functions as far as college popu 
lations are concerned. Certainly neither can be 
considered an effective substitute for the other. 
Further validity studies should be made to de- 
termine the applicability of both these tests for 
screening at the college level. 

Received November 14, 1951. 
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Wechsler-Bellevue and WISC Scattergrams 
of Unsuccessful Readers 


E. Ellis Graham 


University of Denver 


Varied theories of causes of reading failure 
in children have been expounded within the 
last half century. Most recent have been those 
which try to explain the failures through con- 
comitant emotional factors. In such a study 
in 1949, Croley [1] observed that the profile 
which Wechsler [6] indicates as typical of the 
adolescent psychopath was very similar to that 
being obtained from adolescents experiencing 
reading difficulty. Using DuMas’ [2] com- 
parison of slope technique, he created a rank 
order profile for the Wechsler-Bellevue (WB) 
subtests according to Wechsler’s description of 
the adolescent psychopath and compared this 
with the mean profiles of 45 unsuccessful read- 
ers (URs). The resulting r,, was 1.00. Un- 
fortunately only eight subtests were compared. 
In 1951, Hurst and Portenier [4] reported 
similar findings. At about the same time Gra- 
ham [3] remarked the similarity between the 
profile frequently obtained by URs and that 
ascribed to adult hysterics and tentatively hy- 
pothesized that reading, because of its com- 
municative nature, lends itself as a ready sym- 
bol for repressed or suppressed resistance to 
smothering, oppressive, or hostile emotional cli- 
mates encountered by the child. 

To recheck the URs’ WB profile and to sub- 
mit it to the scrutiny of others interested, 
Wechsler tests which had been administered 
to 96 URs were withdrawn from the files of 
the Psychological Service for Children at the 
University of Denver and statistically com- 
pared. These tests had been gathered over a 
four-year period during the processes of clin- 
ical diagnosis. They constituted the entire pop- 
ulation so tested who met the requirements of 
the operational definition of the UR. The UR 
was defined as a child between the ages of 8-0 
and 16-11 who achieved either a Verbal or 


Performance Scale IQ of 90 or higher, who 
had fallen 25 per cent or more below the mean 
reading grade level on the Wide Range 
Achievement Test [5], for a child of his 
chronological age, and who had attended pub- 
lic or private school for the expected number 
of years for his given age. 

Fifty-four children had been given the 
Wechsler-Bellevue Form I (WB I); 11 had 
been given the Wechsler-Bellevue Form II 
(WB II); and 31 had been given the Wech- 
sler Intelligence Scale for Children (WISC). 


The results are indicated in Table 1. 


Table 1 
Mean Scaled Scores and Rank Order of 
WISC Tests 


<a 
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Mean Wtd Scores WB II 

Mean Scaled Scores WISC, N 31 
Rank Order WB I Full Scale 
Rank Order WB II Full Scale 
Rank Order WISC Full Scale 
Rank Order WB I VbI & Pfmnc | 
Rank Order WB II Vb! & Pfmnc 
Rank Order WISC Vb! & Pfmnc 


Mean Wrd Scores, WB I 
MeanCA 155 mos., NV 54 
Mean CA 162 mos., N 11 





Information SS 26 4A 39 6 4 89 
Comprehension 85 9.0110 5 5 311 2 
Digit Span A. 64.95.90 700°9: 5 5 § 
Arithmetic 40 54 83111111 6 6 6 
Similarities as. 374315 6 8 1 2:35 
Vocabulary Ss. eS. On 8 69 3 2 F 
Pic. Arrange 96 111104 22622 4 
Pic.Completion 9.4 9.5112 3 42 3 4 1 
Block Design 92 0M5MSF 436463 G 
Obj. Assembly Ss 066 208: '3,-3:; 5°93 2.2 
Digit Symbol G8. S23 @€5:2.2:6%.4 $5 





Only three children taking the WB tests at- 
tained Verbal Scale (VS) IQ’s of 110 or high- 
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er, and only one obtained a VS IQ above 120. 
Thirty-three attained Performance Scale (PS) 
1Q’s above 110 and four achieved PS IQ’s 
above 120. Of the 65, 61 had PS 1Q’s higher 
than VS IQ’s. The difference found on the 


(because it did not deviate significantly from 
the expected mean). These are important omis- 
sions and must be included if the profile is to 
have real value. They have been included in 
this study as indicated in Table 1 and Figure 








WISC was much less pronounced and 12 of 2. 
the 31 had VS IQ’s equal to or higher than 


























corresponding PS 1Q’s. The mean FS IQ for S 
all WB tests was 97.1, for all WISC tests, y 5. 
100.3. The mean VS IQ for all WB tests was 4s 
88.4, for all WISC tests, 98.9. The mean PS § & H g § Z 
IQ for all WB tests was 107.1, for all WISC SFeo xs SEX GHZ 
tests, 101.7. 3 Pr} y a & zg ¢# FS le on 
To demonstrate the similarity of the current & 2 - 5 3 S & a 6 & 
findings with those of Croley, and to show es es GFF as 
the similarity to the psychopathic profile which ° Gi ARP Ri a - § : 
he projected, one of his graphs is reproduced in | ms ee | | _ mz | 
Figure 1 and over it is drawn the rank-order +h __| +A 
profile of the 54 URs given the WB I and the I i | \t [| 
11 URs given the WB II. h \ | 
eas | 
& \ i: 
g 3 | 
hai N 
= 
:f ae ok ath. gf 
BBS e250 ry 
& § Ps & & 4 
é Beas fs 
wise 
" wei 
“it 2| — | | = weil 
kK. If SS eS i i j 1 
: Ne iy: 
; \N PAL Fig. 2. A comparison of the rank-order profiles of 
4 f ee the WISC, WB I, and WB II tests administered to 
ji URs. 
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The URs’ superiority in performance items 
over verbal might be hypothesized to be due 
to an inherent lack of verbal ability, an inter- 
ference with verbal ability due to repressions, 
or to the failure to learn to read. Whatever 
these factors are, it appeared in this small sam- 
ple that the WISC was less influenced by them 
because 12 of the 31 children demonstrated 
verbal ability as measured by the test at least 
equal to performance ability. Would the scat- 
tergrams of these 12 differ markedly from 
those of the 19 who were superior in perform- 
arice? Or would the patterns contain certain 
basic similarities? Table 2 and Figure 3 show 
that Digit Symbol is routinely the Jow test of 












































Fig. 1. A rank-order comparison of Croley’s poor 
readers, the URs of this study, and the Croley pro- 
jection of the adolescent psychopath. 


Croley eliminated from his study Vocabu- 
lary (because it was omitted in several of his 
subject’s tests), Similarities (because it did not 
meet the test of normality), and Digit Symbol 











i Ellis Graham 
Table 2 
Means, SD’'s, SD,.'s, and ¢ Scores of WISC Subtest 
41 cases 19 cases 12 cases 

M SD SD, tft M D SD, / Vi SD sD, t 
Information 9.42 2.3 +2 1,14 9.11 ; ; 1.62 9.92 2.27 8 21 
Comprehension 11.03 2 9 1.69 10.52 81 f 11.83 2.23 84 1.94 
Arithmetic 8.26 4.0 55 4.16°* 7.32 »4Y 9 +,54°* 9.75 4.14 1.19 21 
Similarities 11.61 2.0 +7 +.4°* 11.63 ) +8 4.39°* 11.58 1.94 73 2.16 
Vocabulary 81 2.42 14 2.7* 8.63 1 8 2.46" 9.08% 2.22 ae 
Digit Span 9.52 »19 49 99 9.42 77 75 9.66 1.1 9 a5 
Picture Completion 11.19 1.08 7 1.9 12.37 ; 14 1.06** 9.33 1.8% 71 105 
Picture Arrangement. 10.39 2.22 «~4l 95 10.74 ? 1.34 9.843 1.26 70 24 
Block Design 10.87 2.88 53 1.64 11.58 44 42 > 54° .75 »29 1.09 23 
Object Assembly 10.65 2.99 $5 1.18 11.84 6% ».75* 8.75 2.05 77 142 
Digit Symbol 8.52 445 63 2.22* 9.47 $2 g1 65 7.0 a5 1 


*Significant beyond the .05 level of confidence. 
**Significant beyond the .01 level of confidence. 



































13 

12 

ww 

w il 

§ 

Bio 

oO 

w 

9 

° 

[sy ) 

#6 

a 7 

r —— 3 URS TESTED 
—-—=—— 19 URS SUPERIOR IN PERFORMANCE 

6[” -----+- 12 URS EQUAL OR SUPERIOR VERBAL 
We Se eS es SS ee 





Fig. 3. On either side of the mean WISC scores 
for 31 URs is shown the mean scores of the 19 who 
were superior in performance and of the 12 who 
were equal or superior in verbal, 


the PS Croley [1] had found it near the mean 
for the subtests. This appears to be the case 
when the PS is higher than the VS, but ad- 
herence to the mean does not reduce the diag- 
nostic significance when the test is ranked with 
other performance items. Arithmetic, on the 
other hand, while retaining profile similarity, 


2.73° 


shifts from its position of lowest test to a posi 
tion of much less diagnostic significance when 
the verbal tests are higher than nonverbal. 
Discussion. It that the 96 
Wecl intelligence to 
the URs, Arithmetic, 
Digit 
low the mean. Object Assembly, Picture Com 
Block Design, 


Comprehension, and Similarities averaged 


was found for 


ler scales administered 
Digit Span, Information 
Symbol, and Vocabulary averaged be- 
pletion, Picture Arrangement 
above the mean. Except for Similarities, there 
seemed to ke no wide divergence of rank be- 
tween the WB tests and the WISC. Ranking 
of the VS and PS subtests separately had the 
effect of emphasizing the relative positions of 
those tests which appear to be most infl 
by reading failure. Digit Symbol was usually 
the lowest PS item. Comprehension and Simi- 
larities were generally the high VS tests. 

The scattergram of the UR corresponds 
closely to the scattergram described by Wech- 
sler for the adolescent psychopath. Whether or 
not young criminals and psychopaths might be 
expected to be poor in reading due to school 
difficulties bears consideration. It should be 
borne in mind, however, that the population of 
this study was educationally retarded, but not 
in trouble with the law. It, therefore, does not 
seem unreasonable to assume that this profile 
is typical of the educationally retarded youth 
without regard to his moral qualities. 

It further appears that the subtest items in 
which the UR routinely experiences his great- 
est successes are those most distant from sug- 
gestions of the classroom. Of the performance 


uenced 











Wechsler and WISC Scatter of Unsuccessful Readers 7 | 


tests, Digit Symbol most closely resembles the 
original reading learning situation. Arithme 
tic, the rote memory required for Digit Span, 
and the recall type of question involved in In 
formation all most closely resemble the school 
situation. lf the UR is resisting, unconsciously, 
the emotional climate of the school or home, 
this should be expected. If this resistance is 
passive, there may be no other noticeable trou- 
ble. If it is active, it might well be manifest 
through truancy and other violations of au- 
thority. 


Received December 19, 1951. 
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A Factor-Analytically Based Rationale 
for the Wechsler-Bellevue ” 


Jacob Cohen 


Bronx Veterans Hospital and New York University School of Education 


Factor analysis makes possible an objective 
approach to the understanding of the psycho- 
logical functions which underlie performance 
on intelligence tests. This powerful analytic 
tool has been widely and fruitfully employed 
by psychometricians, but has, until quite re- 
cently, been largely avoided by researchers in 
clinical psychology. ‘Thus, the two major au- 
thors on the Wechsler-Bellevue Intelligence 
Scale, Wechsler [6] and Rapaport [4] either 
completely ignore or quickly dismiss factor 
analysis as a means of understanding what 
functions the subtests measure. They have re- 
lied instead on clinical experience and theoreti- 
cal formulations based upon such experience. 
It seems to the present author that the value 
of objective test scores is somewhat vitiated 
when their interpretation proceeds from large- 
ly subjective test rationales. Clinical experience 
is far more profitably employed following ob- 
jective analysis than when supplanting it. 

A previous article [2] reported the results 
of a comparative factor analysis of the Wech- 
sler-Bellevue subtests for psychoneurotic, 
schizophrenic, and brain-damaged male veteran 
patients between the ages of 20 and 40 (100 
patients per group).* The major finding was 


Reviewed in the Veterans Administration and 
published with the approval of the Chief Medical 
Director. The statements and conclusions published 
by the author are the result of his own study and do 
not necessarily reflect the opinion or policy of the 
Veterans Administration. 


2This article is based in part upon a doctoral 
dissertation submitted to the School of Education, 
New York University. The author wishes to express 
his thanks to his mentor and friend, Prof. Avrum 
H. Ben-Avi. 

8The author wishes to acknowledge gratefully the 
cooperation of Dr. H. L. Flowers, Chief, Neuro- 
psychiatric Service, and Dr. Robert S. Morrow, 
Chief Clinical Psychologist, of Bronx Veterans Ad- 
ministration Hospital. 


that the same oblique (i.e., correlated) com- 

mon factors are involved in the functioning of 

all three groups. Briefly described, these are. 
Factor A 


cabulary and verbal-symbolic manipulative ability. 


(Verbal) This involves richness of vo- 


Factor B (Nonverbal Organization) The ability 
to organize visually-perceived (nonverbal) material 
into meaningful wholes (not necessarily involv 
ing spatial relationships), against a time limit. 

Factor C (Freedom from Distractibility) A con- 
ative factor which makes it possible for problem 
elements to “register” and to be maintained without 
loss in the course of manipulation, i.e., the ability 


to attend or concentrate. 


As a consequence of the correlation of these 
three common factors, a general factor could 
be extracted in each group and was interpreted 
as present general intellectual functioning and 
symbolized as G. 

The purpose of the present article is to pre- 
sent a rationale for the Wechsler-Bellevue sub- 
tests based on the abovementioned study [2].* 


Information 


This test consistently loads the Verbal fac- 
tor and does so approximately to the same 
degree in the three groups. Through its high 
verbal loading, it also measures G consistently 
well (compared to the other subtests). 

Since the communality of this test is among 
the highest, and since part of its low unique- 
ness must be error variance (lack of reliabil- 
ity), very little specific variance is left, i.e., it 
can only measure to a slight extent anything 
other than the Verbal factor (and through it, 


The present article provides a self-contained 
rationale for the Wechsler-Bellevue subtests. For 
the actual centroid and obliquely rotated common 
factor loadings, communalities, intercorrelations of 
the common factors, and G loadings of the subtests 
in each of the three neuropsychiatric groups, the 
reader is referred to [2]. 
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G). Thus, the test rationales of Wechsler and 
Rapaport are questionable. Wechsler [6, pp. 
77-78] states that this test measures “range of 
information,” which is not of much conceptual 
utility. Rapaport [4, pp. 129-131] states that 
fund of information is a function of cultural- 
educational environment, and that a dynamic 
memory factor operates either to make avail- 
able or fail to make available because of re- 
pression, items of information which were once 
known. If this analysis is valid, it must apply 
to the Verbal factor as a whole and all the 
tests which measure it. An alternative to this 
hypothesis is that Information uniquely taps 
these functions, in which case the variance 
from this source must reside in the small spe- 
cificity of the test, and can hardly be an im- 
portant factor in accounting tor the variability 
of Information scores. 


Comprehension 


The Comprehension test is also consistently 
one of verbal ability, but its efficiency as a 
measure of both the Verbal factor and G varies 
among the groups. When its factor loadings 
and uniqueness in the three groups are jointly 
considered, the following state of affairs 
emerges: The Comprehension test is a moder- 
ately good measure of verbal ability and a 
very good measure of G for psychoneurotics, 
and its measurement of these factors (plus er- 
ror) occupies almost its complete variance. For 
brain-damaged patients, the test is again a good 
measure of verbal ability but a relatively poor 
measure of G, a considerable part of its total 
variance being otherwise deployed in specific 
factors ( and error). For schizophrenics, Com- 
prehension is still a fairly good measure of ver- 
bal ability but a relatively poor measure of G, 
with high uniqueness encroaching upon the ex- 
tent to which it measures these factors. The 
problem, then, is to ascertain what happens to 
the “lost” communal variance. 


An hypothesis is suggested by the considera- 
tion of this test as a measure of “judgment” 
presented in the discussions of Wechsler [6, p. 
81] and Rapaport [4, pp. 110-114]. The hy- 
pothesis is that in not-severely-disorganized in- 
dividuals (e.g., psychoneurotics), the variabil- 
ity of Comprehension scores is largely a func- 
tion of variability in verbal ability and G, and 


less of variability in judgment. In the more 
severe disorganizations of psychosis and brain 
damage, the scores vary less with verbal ability 
and G, and more with the specific factor of 
judgment. 

Quite independent of the validity of this 
hypothesis, the findings indicate that for schizo- 
phrenics Comprehension scores do not provide 
good intellectual 
functioning, a fact corroborated by clinical ex 


indices of present general 


perience. The same is apparently true for 
brain-damaged patients. For psychoneurotics, 
on the other hand, the analysis shows that 
this test score provides a much better index of 
G than in the other groups. 


Digits Forward and Digits Backward 


These two tests were analyzed separately 
[2], but can be discussed together because their 
factorial characteristics are very similar 
throughout (with only one exception which 
will be discussed below). Wechsler’s combina 
tion of these tests into a single subtest [6, pp. 
84-85] is fully justified by their factorial sin 
ilarity [21]. 

When into the 
Digit Span subtest, the result is a test which 
measures Factor C (Freedom from Distracti- 
bility) better than any other test in all three 
groups, although it is only moderately related 
to G. The uniqueness of Digits Forward and 
Digits Backward is among the highest for all 
the groups, which partly reflects their probably 


these tests are combined 


low reliability and is partly indicative of what- 
ever specific factors they measure. 

Rapaport [4, pp. 176-179] holds Digit Span 
to be primarily a test of attention, a function 
which he sharply distinguishes from concen- 
tration, the latter being measured by the Arith- 
metic test in his rationale. This sharp distinc- 
tion between “attention” and “concentration” 
and therefore between the functions measured 
by Digit Span and Arithmetic, is not sustained 
by the present findings [2]. These tests load 
Factor C (and only this common factor) to- 
gether, and are therefore both measuring to a 
considerable extent the same thing. 

Digits Forward and Digits Backward differ 
in the degree to which they measure Factor C 
only in the brain-damaged patients, where the 
Digits Backward test is appreciably more effici- 
ent. Related to this is the fact that it also shows 
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considerably less uniqueness. Also related is the 
fact that it is much more closely related to G. 
Thus, for brain-damaged patients as a group, 
the Digits Backward test is superior to the 
Digits Forward test in measuring both Free- 
dom from Distractibility and present general 
intellectual functioning; for psychoneurotics 
and schizophrenics, these tests are about equal 
in these regards. 


Arithmetic 


The Arithmetic test is the first test which is 
complex and therefore offers ambiguity in its 
interpretation. Unlike the tests thus far dis- 
cussed, it does not measure a single factor in 
all the groups. The Arithmetic test is found 
to measure the Freedom from Distractibility 
factor in the psychoneurotic group, the Ver- 
bal factor in the schizophrenic group, and both 
these factors in the brain-damaged group. The 
complexity of this test provides a good exam- 
ple of a state of affairs, discoverable only 
through the comparative factor-analytic tech- 
nique, whose existence leads to much confu- 
sion in pattern-analytic interpretation and re- 
search. 

Both Wechsler [6, p. 82] and Rapaport [4, 
p. 111] describe this test’s function in terms 
which sample the concept of Freedom from 
Distractibility, but both fail to mention the 
verbal function it taps in two of our three 
groups. 

Wechsler states that arithmetic tests “corre- 
late highly with global measures of intelli- 
gence” [6, p. 82]. This may be true for clin- 
ically normal subjects, but for the patient 
groups studied this is true (in ter 1s of corre- 
lation with G only for brain-« inaged pa- 
tients. Compared to the other subtests, it is 
only a mediocre measure of G for psychoneu- 
rotics, and a relatively poor one for schizo- 
phrenics. 

The test is virtually useless for differential- 
diagnostic purposes, since it is necessary to 
know a patient’s diagnosis before it is possible 
to know what a given score on this test re- 
flects, and even then one cannot be sure. For 
example, a low score on this test may reflect 
poor verbal ability, high distractibility, or both, 
depending on the diagnosis. Also, the score may 
be an excellent, mediocre, or poor index of the 
patient’s present general intellectual function- 
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ing, again depending on the patient’s diagno- 
sis. 
Similarities 

This test offers no ambiguity in its test func- 
tion interpretation. It loads only the Verbal 
factor consistently in the three groups, al- 
though with varying efficiency. Also, it is a 
better than average measure of G in all three 
groups. 

Wechsler [6, p. 86] and Rosenzweig and 
Kogan [5, p. 25] present rationales for Simi- 
larities which are best summarized by Rapa- 
port’s term “verbal concept formation.” Rapa- 
port [4, pp. 146-151] holds this function to 
be uniquely measured by this test. He does, 
however, point out that acceptable responses 
may be given on the basis of ‘“well-autonomized 
verbal convention.” The present findings [2] 
would support this idea. It is possible that to 
some relatively small degree, this test meas- 
ures specifically one type of verbal-manipula- 
tive ability, namely concept formation, but 
since it has relatively low uniqueness, its func- 
tion must be considered primarily one of meas- 
uring general verbal ability. ‘That this is rea- 
sonable can be seen when one considers that 
sheer richness of vocabulary can affect Simi- 
larities scores considerably. For example, if a 
subject’s speaking vocabulary does not include 
the words “vehicle” or “transportation,” he 
can not earn full credit on the “Wagon- 
Bicycle” item. 


Vocabulary 


The Vocabulary test is the best measure of 
the Verbal factor in the psychoneurotic and 
schizophrenic groups, but among the verbal 
subtests it is almost the poorest measure of this 
factor in the brain-damaged group. As a meas- 
ure of G, it is the dest subtest for the psycho- 
neurotics, ranks sixth out of the twelve sub- 
tests for the schizophrenics, and is the poorest 
subtest for the brain-damaged. The differences 
among the groups are by far the greatest here, 
and an examination of their meaning is fruit- 
ful. In the brain-damaged group, Vocabulary 
is a poor measure both of present general in- 
tellectual functioning and of present® verbal 

5Cf. the discussion of the issue of whether tests 


measure directly present versus “potential” intelli- 
gence [4, p. 37]. 
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ability, while in the psychoneurotics, it is the 
best measure of both present general and pres- 
ent verbal ability. Thus, in an intellectually 
deteriorated group, the Vocabulary test fails to 
provide a highly valid index of present func- 
tioning, which it does for a group of intellec- 
tually nondeteriorated patients. ‘This is highly 
consonant with (although it of course cannot 
prove) the Babcock hypothesis, namely that 
scores on vocabulary tests, since they are quite 
resistant to the effects of brain damage or men- 
tal illness, provide estimates of premorbid level 
of functioning [1]. That is, it is reasonable to 
hypothesize that the reason that the Vocabu- 
lary test measures both present general func- 
tioning and present verbal-manipulative ability 
poorly in the brain-damaged group is that it 
is measuring premorbid intellectual level. ‘The 
same consideration applies for the schizophren- 
ics, the fact that 


very 


it is a mediocre and not a 


poor measure of G here being probably 
due to the fact that the general level of de- 
terioration in the group studied was not high. 

If the above interpretation is accepted (it 
being essentially the Babcock hypothesis), the 
Vocabulary test becomes very useful in clinical 
psychodiagnostic practice. When a patient’s vo- 
cabulary level is appreciably higher than the 
Full Seale IQ level (the latter being a meas- 
ure of G), it may be suspected that he was pre- 
viously functioning at a higher general intel- 
lectual level than at present, and the implica- 
tion of intellectual deterioration deserves in- 
vestigation. 

Apart from its utility in providing a base 
line from which to measure deterioration, the 
high regard in which this test is held by test 
constructors as a measure of general intelli- 
gence in nondeteriorated individuals [6, p. 89] 
is borne out in the present analysis [2]. For the 
neurotics (and presumably for normals) this 
test is the best measure of verbal ability, and, 
through verbal ability, of general intelligence. 

Rapaport [4, pp. 87-90] accepts the Bab- 
cock hypothesis and points out, in addition, 
that Vocabulary is dependent upon the early 
educational environment. Since he makes the 
same observation with regard to the Informa- 
tion and Similarities tests, this is not a unique 
feature of the Vocabulary test, but probably 
generally of the Verbal factor and therefore 
of all the tests which measure it. 
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Picture Arrangement 


This test measures the Nonverbal Organi 
factor in all the 
poorly in the psychoneurotics and schizophren 


zation groups, relatively 
ics and quite well in the brain-damaged group.* 
Its measurement of present general function 
ing is generally poor in the three groups. ‘The 
test has low communality with the battery in 
the psychoneurotics and schizophrenics, i.c., its 
variance is to a considerable degree “tied up’ 
in measuring something which is not measured 
by the other tests over and above the Nonver 
factor 
intelli 
gence” [6, p. 88], or as Rapaport posits “plan 
ning ability” 215 


bal Organization factor. This specific 


may be as Wechsler suggests “social 


and “anticipation” [4, pp 
220). The author would withhold judgment 
subject to further factor-analytic investigation, 


where this specific variance may be “captured” 


in an interpretable common factor 


Picture Completion 

This test is found to be consistently com 
plex, measuring both the Verbal and Nonver 
bal Organization factors in all the groups, the 
latter with somewhat higher loadings than the 
former. It is not a good test of cither of these 
factors compared with the other tests which 
load them. The fact that this test measures 
two abilities simultaneously makes it extremely 
ambiguous in pattern-analytic interpretation. A 
given low Picture Completion test score ma 

be a result either of low verbal or low non 

verbal organization ability, or of both, and can 
therefore not be unequivocally interpreted. 

The Picture Completion test is among the 
poorest measures of G in all the groups, this 
despite the fact that it loads two common fa 
tors. 

Wechsler that Picture Completion 
measures “the ability of the individual to dif- 
ferentiate essential from unessential details” [6, 
p. 91]. The present analysis [2] offers no sup- 
port for this contention, if it is intended as a 
specific function of the Picture Completion 
test. Rapaport [4, pp. 231-233] considers this 


states 


®It is possible that this test is also complex, since 
its loadings for Freedom from Distractibility in the 
psychoneurotic group and for the Verbal factor in 
the schizophrenic group approach significance. 
This would also help account for the fact that its 
loadings on Nonverbal Organization are so low for 
these groups. 
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test to be one of concentration acting upon vis- 
ually perceived material. This would lead op- 
erationally to the expectation that the Picture 
Completion test would correlate highly with 
the Arithmetic test (which Rapaport also con- 
siders to be a test of concentration). This ex- 
pectation is not realized; it is found that in 
all three groups the Arithmetic-Picture Com- 
pletion correlation coefficients are not higher 
than the correlations of Arithmetic with the 
other subtests, in fact they are among the low- 
est. 

In summary, the Picture Completion test is 
uniquely poor—it measures common factor 
functions ambiguously and present general in- 
tellectual functioning with low validity. 


Block Design 


This test too offers some ambiguity in its 
interpretation. It measures Nonverbal Organi- 
zation well in the schizophrenics and brain- 
damaged, and no other factor. In the psycho- 
neurotics, however, it measures both this fac- 
tor and also Freedom from Distractibility, 
both relatively poorly. Thus, here too, the 
knowledge of the patient’s diagnosis is neces- 
sary to approach an understanding of his score. 
Without such knowledge, one can attempt to 
determine the significance of a score on this 
test by reference to the “pure” tests of Factor 
B and C. This procedure, however, is fraught 
with uncertainty, given the relatively low re- 
liabilities of the tests and the appreciable in- 
tercorrelations among them [3, pp. 151-153]. 

Despite the ambiguity of common factor in- 
terpretation, this test is useful because of its 
correlation with G. In all the groups, it is at 
least the most closely related to G among the 
performance tests, and for the schizophrenics, 
it is best in this regard of all the tests. The 
Block Design test is therefore an excellent sin- 
gle measure of present general functioning 
via nonverbal means and justifies the high re- 
gard for it expressed by Wechsler [6, pp. 91- 
92], who states that this test “is one of the 
few performance tests that seemingly does 
measure very much the same sort of thing that 
verbal tests measure” [6, p. 91]. The analysis 
[2] demonstrates that “the same sort of thing” 
is the general factor. 

Both Wechsler [6, pp. 91-92] and Rapaport 
[4, pp. 249-253, 271-275] in their discussion 


of the function of the Block Design test de- 
scribe it in a way which samples the ideas con- 
tained in the present interpretation of Factor 
B, the Nonverbal Organization factor. Again, 
however, this is so verbalized as to stress their 
specificity of measurement, rather than the 
communality shared with the other Factor B 
tests. 


Object Assembly 


No ambiguity in interpretation is posed by 
this test, which is found to be the best meas- 
ure of the Nonverbal Organization factor in 
all three groups, and to measure no other fac- 
tor. Indeed, it measures this factor better 
than any other subtest measures any common 
factor. Its consistently low uniqueness indi- 
cates that it measures little other than Factor 
B. 

On the other hand, it is on the whole the 
poorest measure of present general intellectual 
functioning, ranking lowest in all the groups. 
This is a consequence of the fact that it is so 
excellent a measure of Factor B which among 
the common factors consistently bears the low- 
est relationship to G. Wechsler [6, p. 98] also 
points out that this test correlates poorly with 
other tests and therefore with general ability. 

Both Wechsler [6, p. 98] and Rapaport 
[4, pp. 249-259] consider this test as primarily 
useful for qualitative appraisal of work meth- 
ods. Rapaport assigns as Object Assembly’s 
main function the measurement of visual or- 
ganization, which is included in the interpre- 
tation of Factor B. 


Digit Symbol 


The final test again raises the problem of 
ambiguity. It is essentially a measure of Free- 
dom from Distractibility, loading this factor 
significantly in two groups and suggestively 
high in the brain-damaged group. The ambig- 
uity resides in the fact that in the latter group 
it is as good a measure of the Nonverbal Or- 
ganization factor as it is of Freedom from Dis- 
tractibility in the other two groups [2]. One 
can speculate that the reason for this resides 
in the fact that a greater part of the variance 
of this test in the brain-damaged is associated 
with visual organization and simple speed than 
is the case with the non-brain-damaged. 

The Digit Symbol test is a moderately good 
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measure of present general intellectual func- 
tioning, being second only to Block Design 
among the tests of the Performance Scale. It 
is particularly valid as a measure of G for the 
schizophrenics, where it is second only to Block 
Design; in the other groups, it assumes a me- 
dian position in this regard. 

Rapaport [4, pp. 288-291], in discussing the 
function of the Digit Symbol test, stresses pri- 
marily the motor activity involved, secondarily 
the visual organization process, and lastly men- 
tions its learning function, indicating that this 
learning is of an “attention” character. ‘The 
last is consonant with the present finding, i.e., 
that it is essentially a measure of Freedom 
from Distractibility. As noted, its factorial va- 
lidity as a measure of visual organization or 
allied functions holds only for the brain-dam- 
aged. Insofar as its motor characteristics are 
concerned, these may reside in its specificity, 
but at best these would be of secondary im- 
portance. 


Summary and Conclusions 


The results of a comparative factor analysis 
of the Wechsler-Bellevue subtests in groups of 
psychoneurotics, schizophrenics, and brain- 
damaged patients [2] led to a rationale for the 
subtests which was presented in the present ar- 
ticle. Each subtest’s function in terms of three 
common factors (Verbal, Nonverbal Organi- 
zation, and Freedom from Distractibility) and 
a second-order factor (present general intel- 
lectual functioning—G) was discussed and 
compared with the test functions assigned by 
Wechsler and Rapaport. The following gen- 
eral conclusions are drawn: 

1. Some of the Wechsler-Bellevue sub- 
tests—Arithmetic, Picture Arrangement, 
Block Design, and Digit Symbol—do not 
measure the same common factor or combina- 
tion of common factors in different neuropsy- 
chiatric groups. Thus, one must know the pa- 
tient’s diagnosis in order to know what com- 
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mon factor or factors the test is measuring in 
him. Another problematic test is Picture Com- 
pletion which is complex, measuring both the 
Verbal and Nonverbal Organization factors in 
all the groups. A given score on this test is 
ambiguous in that it may reflect the operation 
of either or both of these factors. Finally, even 
when a test does measure a single common fac- 
tor function, it may do so with varying valid- 
ity in different groups; the same consideration 
holds for a test’s measurement of present gen- 
eral intellectual functioning. Considerations 
such as these sharply limit the clinician in his 
attempt to interpret objectively a given sub- 
ject’s score on a Wechsler-Bellevue subtest. 

2. Much of the test rationales of Wechsler 
and Rapaport is not supported in the present 
factor-analytic rationale. These authors imply 
a specificity of measurement for each of the 
subtests which is untenable in the light of the 
appreciably high order of subtest intercorrela- 
tion. The latter leads to test communalities 
whose magnitude, together with the relatively 
low reliabilities, precludes the possibility of the 
subtests measuring specific factors to any sig- 
nificant degree, at least in patient populations. 


Received December 19, 1951. 
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“Subtlety” in Structured Personality Tests 


William Seeman 


Mayo Clinic, Rochester, Minnesota 


In an early paper on projective technics 
Frank argued that a defining property of such 
a technic consists in this distinctive feature of 
the stimulus situation: 


[It is] designed or chosen because it will mean 
to the subject, not what the experimenter has arbi- 
trarily decided it should mean (as in most psycho- 
logical experiments using standardized stimuli in 
order to be “objective”), but rather whatever it 
must mean to the personality who gives it, or im- 
poses upon it, his private, idiosyncratic meaning 
and organization [2, p. 403]. 


In a later review of the nature and theory 
of projective methods Sargent correctly recog- 
nized that “the very wording of the above def- 
inition implies a controversy: it presents pro- 
jective techniques not only as an addition to 
our present stock of instruments ; it also implies 
that they are set up in opposition to some- 
thing” [6, p. 257]. This “opposition” charac- 
ter is observed frequently both in the litera- 
ture and in discussion among clinical psychol- 
ogists. For example, Hutt stated that struc- 
tured tests are characterized by the presence of 
“culturally crystallized questions” which, it is 
assumed, “will have the same meaning to all 
subjects [4, p. 135]. A very considerable de- 
fect which Hutt assigned to such structured 
tests lies in their presumed failure to “offer 
access to the personality make-up or to its pro- 
cesses” [4, p. 136]. 


It is the “meaning’”’ property of the stimuli 
in structured personality tests which is the 
subject of this investigation, and the term 
“subtlety” will be used to refer to the degree 
to which this “meaning” can or cannot be ar- 
bitrarily assigned in a priori fashion. As an 
example consider two items from the Minne- 
sota Multiphasic Personality Inventory 
(MMPI) : “It takes a lot of argument to con- 
vince most people of the truth” and “I have 
a habit of counting things that are not impor- 
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tant, such as bulbs on electric signs and so 
forth.” To the extent that the psychodynamic 
meaning of the second item can easily be es- 
tablished with a high degree of interpersonal 
agreement (that is, most individuals who have 
had the requisite psychological or psychiatric 
training would agree that this is an obsessive- 
compulsive mode of defense) whereas this is 
not true of the first item, the first item would 
by definition be properly characterized as 
“more subtle” than the second. For experi- 
mental purposes this definition will prove ade- 
quate although it does not provide for a com- 
plete ordering of a hierarchy of “subtlety” 
should occasion ever require such an ordering. 
But in this respect such items are no different 
from projective instruments, as will soon be 
apparent to anyone who tries to order in a 
hierarchy of “subtlety” such instruments as the 
Bender Gestalt, the various sentence comple- 
tion tests, the Thematic Apperception Test, 
and the Rorschach; or who tries to establish 
such a hierarchy for the TAT cards. It seems 
clear, however, that “subtlety” is a quality 
which cannot be dichotomized in all-or-none 
fashion, but rather that there are degrees of 
this property. 

It is this property of “subtlety” in which 
structured personality instruments have been 
commonly presumed to be deficient. And, in- 
deed, those structured tests in which the items 
operate as self-ratings and behavior-surrogates, 
and to which it is presumed that the psycho- 
dynamic significance can be assigned a priori 
may well be lacking in such “subtlety.” It is, 
however, most essential to recognize that this 
approach to structured personality instruments 
constitutes an historical accident rather than a 
defining property of such tests. In a theoretical 
analysis of the dynamics of structured person- 
ality tests Meehl has this to say about the self- 
rating a priori approach: 





“Subtlety” in Structured Personality Tests 


Associated with this approach to structured per- 
sonality tests is the construction of items and their 
assembling into scales upon an a priori basis, re- 
quiring the assumption that the psychologist build- 
ing the test has sufficient insight into the dynamics 
of verbal behavior and its relation to the inner 
core of personality that he is able to predict before- 
hand what certain sorts of people will say about 
themselves when asked certain sorts of questions. 
The fallacious character of this procedure has been 
suficiently shown by the empirical results of the 
Minnesota Multiphasic Personality Inventory alone. 
... It is suggested tentatively that the relative use- 
lessness of most structured personality tests is due 
more to a priori item construction than to the fact 
of their being structured [5, p. 297]. 


An alternative approach to the building of 
structured personality or psychiatric instru- 
ments is one which considers all items to be 
stimuli, the nontest correlates (that is, psycho- 
dynamic “‘meaning’’) of which must be labori- 
ously searched out experimentally. Since such 
a procedure repudiates the a priori assignment 
of “meaning,” it is obvious that it opens the 
way for the introduction of “subtle” items 
into the test. 


All the items used in the investigation to 
be reported were taken from a structured 
test which was built along these empirical 
lines, the MMPI. From this test were selected 
30 items; of these, 15 were “obvious” items 
such as the “counting” item referred to earlier 
in this paper. They were designated as “obvi- 
ous” items because their “psychological mean- 
ing” was considered to be quite clearly recog- 
nizable by psychologically trained individuals. 
The other 15 items were designated “subtle” 
items because it was considered that their 
“meaning” and psychological significance 
would not be detected by the same persons. 
The items, the manner of presentation, and the 
directions are described in the following para- 
graphs: 


Part 1—Directions: This is an experiment in 
which your assistance is being enlisted. The prob- 
lem is to identify items which, in your opinion, 
would be of diagnostic value on some psychiatric 
scale of a structured type of psychological test. A 
“psychiatric scale” is to be regarded as a category 
such as “schizophrenia,” “hysteria,” “depression,” 
and so forth. 

Read each item and decide whether the answer 
“true” or “false” to that item would render the item 
of value in any of the psychiatric categories listed 
with it. If you decide it belongs on a certain diag- 
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nostic scale, put an X before that category. If you 
think the item would be of no value for any of the 
categories listed, place the X before “none.” As an 
example consider the item: “I am happy most of 
the time.” The answer “false” would help to iden- 
tify this as a depressed person and so would be- 
long on the “Depression” scale of the test. 


The items in part 1 are as follows: 


1. I believe I am being plotted against. ...a. De- 
pression ....b. Obsessive-compulsive ....c. 
schizophrenia ...d. Hysteria ...e. None 


Paranoid 


2. I have had periods in which I carried on activ- 
ities without knowing later what I had been doing. 


.a. Schizophrenia ....b. Paranoid state or condition 


...<. Hysteria ...d. Obsessive-compulsive ...e. None 
3. Peculiar odors come to me at times. ....a. Hypo- 
mania ....b. Schizophrenia ...c. Depression ...d. Ob- 
sessive-compulsive ....e. None 

4. Much of the time my head seems to hurt all 


over. ...a. Paranoid state —..b, Obsessive-compulsive 


c. Depression ....d. Hysteria ...e. None 


5. I am neither gaining nor losing weight. -..a. 
Hysteria b. Depression ...c. Hypomania -..d. 


Schizophrenia ....e. None 


6. It takes a lot of argument to convince some peo- 
ple of the truth. ...a. Paranoid schizophrenia ....b. 
Obsessive-compulsive —..c. Paranoid state ...d. Hys- 
teria ....e. None. 


7. Most people will use somewhat unfair means to 
gain profit or an advantage rather than lose it. 
...a. Hysteria ...b. Depression ....c. Hypomania ....d. 
Schizophrenia ...e. None 


8. Most nights I go to sleep without thoughts or 
ideas bothering me. ...a. Hypomania. ...b. Para- 
noid schizophrenia ...c. Obsessive-compulsive ...d. 
Hysteria ....e. None 


9. I almost never dream. ...a. Depression ...b. Ob- 
sessive-compulsive ....c. Paranoid state ...d. Hypo- 


mania ....e. None. 


10. I often feel as if things were not real. ...a. Hys- 
teria ...b. Schizophrenia —..c. Hypomania —..d. Para- 
noid state ...e. None 


11. I have a habit of counting things that are not 
important such as bulbs on electric signs and so 
forth, ....a. Obsessive-compulsive ....b. 
Schizophrenia ...d. Mania ...e. None 


Hysteria ....c. 


12. Someone is hypnotizing me and has control of 
my mind. ....a. Depression ...b. Hysteria —..c. Obses- 
sive-compulsive ...d. Paranoid schizophrenia ...c. 
None 


13. I commonly wonder what hidden reason a per- 
son may have for doing something nice for me. 
....a. Obsessive-compulsive ...b. Hypomania ..c. 
Hysteria ...d. Depression ...e. None 


14. Sometimes some unimportant thought will run 
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through my mind and bother me for days. ....a. Ob- 
sessive-compulsive ....b, Hysteria ...c. Schizophrenia 
..d. Depression ....e. None 


15. I get mad easily and then get over it. ..a, Hys- 
teria ....b. Paranoid state ...c. Hypomania ....d. Ob- 
sessive-compulsive ....e. None 


16. I am very seldom troubled 
....a, Hysteria ....b, Depression 


by constipation. 
c. Hypomania ....d. 


Obsessive-compulsive ...c. None. 

17. I like to flirt. ...a. Hysteria. b. Paranoid 
schizophrenia ...c. Depression ...d. Hypomania ....c. 
None 


18. I can be friendly with people who do things 
which I consider wrong. 

.b. Depression . 
None 


a. Obsessive-compulsive 


c. Hypomania ....d. Hysteria ....e. 


Part 2—Directions: In each of the items below 
indicate how the item should be scored for the cate- 
gory mentioned, In every instance T means True, 
F means False, and ? means Cannot say. For ex- 
ample, the item below has the F encircled to indi- 
cate that the answer False is scored in the direc- 
tion of Depression. 


“I seem to be as capable and smart as most oth- 
ers around me.” T (F) ? 
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1. The item “Most people inwardly dislike putting 
themselves out to help other people” would be 
scored as a paranoid answer if answered T F ? 


2. The item “I wish I could be as happy as others 
seem to be” would be scored as a depressed an- 
swer if answered T F ? 


3. The item “I never worry about my looks” would 
be scored for hypomania if answered T F ? 


4. “I dream frequently about things that are best 
kept to myself” would be scored as depressive if 
answered T F ? 


5. “Someone has been trying to influence my mind” 
would be scored for paranoid schizophrenia if an 
swered T F ? 

6. “I do not blame a person for taking advantage 
of someone who lays himself open to it” would be 
scored in the direction of depression if answered 
ye ae 

7. “Some people are so bossy that I feel like doing 
the opposite of what they request even though I 
know they are right.” This item would be scored 
as paranoid if answered T F ? 


8. “Much of the time I feel as if I have done some- 
thing wrong or evil” would be scored as obsessive- 
compulsive if answered T F ? 


Table 1 


Success 











Item Dt Hyt Pat O-Ct 
1 1 l 2 
ee 24 0 5 
a 3 12 
o_o 10 37* 3 0 
54.) Saeeeeee 10 8 
6(S) 5° 24 7 
7(S) 11 7* 

8 0 29° 
9(S). 5 5 12° 

ees 2 2 

| 0 56° 

ee 0 2 3 

13(S)... 17 1° 8 

_, ES Oe SEA Eon 2 2 54° 

2) | 8* 3 2 

MOR AUP 16* i 

SEES i* 4 

it 4+ 6* 13 


and Failure in Identifying Appropriate Psychiatric Scales as 





PaSct Sct Mat None Pt 
53° 1 <.01 
26* 3 <.01 

31° 2 10 <.01 

*, <.01 

0 2* 38 <.01 

11 13 <.01 
10 2 28 <.01 

2 5 22 <.01 
6 30 <.01 

37* 4 13 <.01 

0 1 1 <.01 

53° 0 <.01 
0 32 <.01 

0 0 <.01 

31 14 <.01 

4 26 <.01 

8 26 19 <.01 
9 26 <.01 





*Indicates appropriate scale using MMPI as criterion. 

+Probability of obtaining the indicated distribution for each item assuming theoretical N = one fifth of ictal 
N for each choice in that item. 

}Depression, hysteria, paranoid state or condition, obsessive-compulsive, 
mania or hypomania. 

§(S) indieates “subtle” items. 


paranoid schizophrenia, schizophrenia, 
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9. “I usually have to stop and think before acting 
even in trifling matters” would be scored obsessive- 
compulsive if answered T F ? 


10. “I drink an unusually large amount of water 
every day’ would be scored for hysteria if an- 
swered T F ? 

11. “Bad words, often terrible words, come into my 
mind and I cannot get rid of them.” This item 
would be 
swered T F ? 


scored as obsessive-compulsive if an 


12. “Most people are honest chiefly through fear of 
being caught” would be scored in the paranoid di 
rection if answered T F ? 


The subjects in this investigation were 58 
students in a graduate and advanced under- 
graduate course in clinical psychology at the 
University of Minnesota. The prerequisites for 
this course included, among other courses, two 
quarters of abnormal psychology in which the 
standard nosological categories are described 
and studied at some length. Three of the stu- 
dents nevertheless did not fulfill these require- 
ments and in some of the statistical tests their 
responses were not considered. It will be obvi- 
ous, however, that the inclusion or exclusion 
of these three members of the class would in 
no way change the results of the test. 


Results 


Table 1 presents the responses of the sub- 
jects to the first 18 items, indicating the man- 
ner in which each item was assigned to the 
psychiatric categories. Since the asterisk indi- 
cates the MMPI scale from which the item 
was actually taken, it is clear that “‘success” in 
identification of the appropriate scale will be 
indicated when the number of students assign- 
ing the item to that scale exceeds the number 
assigning the item to any other scale. It might 
be fruitful before proceeding, to make a brief 
analysis of how these items may be expected to 
behave in the light of the concept of “subtle- 
ty.” It is, in the first place, quite obvious that 
the failure of those items designated as 
“subtle” to behave in a discriminably different 
manner from the “obvious” items would (at 
least so far as this experiment is concerned) 
lend little support to the validity of the con- 
cept. There are, however, several ways in 
which the concept may be experimentally sup- 
ported. In the first place, one would expect 
that the “none” category should be used with 


a 


a significantly greater frequency for the 
“subtle” items than for the “obvious” items, 
either in consequence of the “innocuous” char- 
acter of some of the items or in consequence of 
the ambiguity of their “meaning.”’ A second and 
obvious way in which the concept would re- 
ceive experimental support has already been, in 
part, indicated; that is, a significantly greater 
success is to be expected in the assignment of 
psychiatric categories for the “obvious” items 
than for the “subtle” items. ‘Third, a chance 
assignment to the psychiatric categories for the 
“subtle” items might be expected in some in- 
stances, in consequence of the ambiguity of 
“meaning.” Finally, if one were to consider 
the 15 “subtle” and the 15 “obvious” items as 
two separate subtests, one would expect a sig- 
nificantly different distribution of scores, con- 
sidering the number of correct identifications 
as a score, 

There can be no question that Table 1 does 
provide considerable experimental support for 
the concept of “subtlety” in structured items. 
Comparison of the frequencies in the starred 
categories for “subtle” and “obvious” items re- 
veals that in every instance the frequencies, for 
the “obvious” items, in the correct category, 
exceed the frequencies in any other category, 
whereas there are no instances in which this 
is true of the “subtle” items. It does not ap- 
pear to strain matters to characterize these re- 
sults as striking. Furthermore, analysis of the 
frequencies in the category “none” reveals that 
whereas the “subtle” items were assigned here 


Table 2 
Direction of Responses to “Subtle” and “Obvious” 
Items Indicated by 58 Students in 


Item r I ? P 
1(S) 50 2° 6 <.01 
2 55* 2 1 <.01 
3 41* 7 10 <.01 
4(S) 46 4° 8 <.01 
5 56* 2 0 <..01 
6(S) 18 24* 16 >.05 
7(S) 37 7® 14 <..01 
8 51° 6 1 <.01 
g 45* 7 6 <.01 
10(S) 41 4* 13 <.01 
11 31* 1 26 <.01 
12(S) 48 5* 5 <.01 





*Indicates correct scoring on MMPI criterion. 
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43.3 per cent of the time, the “obvious” items 
were so assigned only 11.1 per cent of the 
time; this is a difference significant at the .01 
level, and in the direction required by this 
formulation. 

Table 2 presents similar data for the 12 
items in Part 2. Again we note that in every 
instance the responses to “obvious” items are 
correctly assigned as judged by the criterion. 
For the “subtle” items five of the six are in- 
correctly assessed ; one (item 6) meets the pre- 
viously indicated criterion of chance distribu- 
tion of responses, the chi square being 1.79 
with 3 df. So that here, again, the evidence is 
strikingly in support of the concept of “subtle- 
ty” as defined in this paper. 


Table 3 


Distribution of Scores on “Subtle” 
and “Obvious” Subscales 














Part 1 Part 2 
Score “Subtle” “Obvious” “Subtle” “Obvious” 

9 4 

8 9 

7 23 

6 12 1 30 
5 1 4 0 17 
+ 2 1 0 6 
3 2 2 2 2 
2 10 6 

1 18 18 

0 22 28 

Table 4 


Breakdown of “Obvious” and “Subtle” Scores 
by Randomization Procedures 











Part 1 Part 2 
“Subtle” “Obvious” “Subtle” “Obvious” 
>5.5 0 23 >3.5 1 25 
<5.5 23 + <3.5 27 2 
x? — 41.1 x2 — 44.0 
a7== 1 adj== 1 
p< 01 p< 01 





What, now, is the nature of the evidence 
when the items are considered as two separate 
subscales? The answer to this question, as in- 
dicated in Tables 3 and 4, is quite unequivo- 
cal. Table 3 presents the distribution of scores 
for “subtle” and “obvious” subscales for both 
parts of the “test.” It is obvious immediately 
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that the distributions are virtually nonover- 
lapping in character. Since the distributions are 
so obviously skewed it seemed wise not to ap- 
ply the test of significance of difference between 
means, which involves the assumption of nor- 
mality.* It is, however, possible and appropri- 
ate to apply the chi-square test, provided suit- 
able precautions are taken to ensure independ- 
ence of categories. ‘This was done by breaking 
down the entire group into two random sub- 
groups. The data for all items are presented 
in Table 4, and it is clear that the differences 
are in the required direction and well beyond 
the .01 level of significance. 

We may take it, then, that the phenomenon 
of “subtlety” in structured psychological in- 
struments is an experimentally verifiable one; 
or, alternatively stated, that the concept of 
“subtlety” can be shown to have an empirical 
basis. The significance of this has been dis- 
cussed at some length by Meehl [5] and has 
been noted by Weiner [7] and by Hathaway 
and McKinley [3]; it therefore requires no 
extended further discussion here. Suffice it to 
underscore the point that those features which 
appear to have been primarily accountable for 
the frequently noted weaknesses and failures 
of structured psychological tests were to a con- 
siderable extent a function of the a priori con- 
struction procedures which ipso facto rule out 
the concept of “subtlety” of items. 

There remain, however, two problems of 
such significance that they must be disposed of 
at this point. Let us consider two hypothetical 
critics, the first of whom argues in the follow- 
ing manner: it is true that the items designated 
as “subtle” behave in an experimentally specifi- 
able fashion differently from the “obvious” 
items. This, however, may simply show that 
they are personalogically without validity, 
whereas the “obvious” items are valid. And 
indeed, it is a somewhat analogous argument 
which Allport [1] makes. Our hypothetical 
critic goes on to assert that one can obviously 
see that the items are without psychological or 


1These figures may nevertheless be worthy of a 
footnote. For Part 1 these are as follows: Mean “ob- 
vious” = 6.74, “subtle” = 1.03; SD “obvious” — 
1.35, “subtle” = 1.27. This difference is significant 


beyond .01. For part 2: Mean “obvious” = 5.3, 
“subtle” — .8; SD “obvious” = 1,14, “subtle” — 
39. 
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psychiatric significance. To such an argument 
we can only reply that it defines validity as 
face validity, requires the construction of 
structured tests in a priori fashion, and leaves 
no room for “subtlety” of items. Our second 
hypothetical critic, however, cannot be disposed 
of in such a cavalier fashion. His argument is 
something like this: the experimental verifica- 
tion of the concept of “subtlety” rests, in the 
last analysis, on the assumption of the validity 
of the criterion. What if the MMPI were not 
a valid test? Our answer to this would be an 
unhesitating denial that the validity of the 
MMPI is crucial to the validity of the con- 
cept advanced in this paper. At most, the in- 
validity of the MMPI (were it demonstrated ) 
would indicate that these specific items might 
be faulty. It seems unlikely that our hypotheti- 
cal critic will wish to maintain that in prin- 
ciple and for theoretically weighty reasons it 
would be impossible to build a valid multi- 
phasic instrument. But with the possibility of 
building such an instrument the experimental 
validity of the “subtlety” of concept remains 
secure. 
Summary 


It was the aim of this paper to determine 
whether experimental evidence could be ad- 
duced to give empirical content to the concept 
of “subtlety” in structured personality and 


psychiatric instruments. To that end, 15 items 
designated as “subtle” (in the sense that the 
content of the items did not appear to reveal 
their psychodynamic “meaning’’) and 15 items 
designated as “obvious” were selected from 
the MMPI, a widely used structured instru- 
ment based on psychiatric categories. ‘These 
were assembled in two subsets and adminis- 
tered to 58 students of clinical psychology. 
The evidence presented is regarded as unequiv- 
ocally supporting the concept of “subtlety” in 
structured items. 


Received Nowember 17, 1951. 
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Personality Correlates of Q—L Variability 
on the ACE 
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University of California, Santa Barbara College 


It seems reasonable to expect that individ- 
ual variability along culturally significant con- 
tinua should color one’s personality structure. 
It is not necessary to haggle over the point of 
origin of the variability, i.e., whether cultur- 
ally derived or a function of gene structure, 
to accept such an assumption. It is only neces- 
sary at this time to concede the strong prob- 
ability that any significant individual varia- 
tion in general intellectual functioning, special 
aptitude, physique, endocrine balance, texture 
of skin—all should be reflected in that bio- 
social, integrated matrix of reactions which is 
labeled one’s personality structure, regardless 
of the originating point of the variability. 

The specific problem to which the present 
paper is addressed concerns the effect of varia- 
tion in quantitative versus linguistic abilities 
on the personality structure of college women, 
when the two types of abilities and personality 
are defined in terms of structured psychome- 
trics. Quantitative and linguistic abilities are 
defined in terms of scores on the American 
Council on Education Psychological Examina- 
tion (ACE); and personality structure is de- 
fined by answers given to the group form of 
the Minnesota Multiphasic Personality In- 
ventory (MMPI). It must be admitted that 
the ACE is a somewhat fallible measure of the 
posited dichotomy between quantitative and 
linguistic abilities. Even more freely will it be 
admitted that the MMPI measures only one 
segment of that total complex which is called 
personality. Nevertheless, both measures have 

1The present study was subsidized by the Re- 
search Committee of the University of California, 
Santa Barbara College. Indebtedness is also ex- 
pressed to Dr. Jerry H. Clark, Registrar, Santa 


Barbara College, for aiding in certain preliminary 
aspects of the study. 
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the advantage of yielding numerical data 
which may be treated objectively. 


No one presently knows the exact number 
of independent intellectual abilities or factors 
possessed by man. Thurstone, for example, 
prefers an arbitrary seven. Yet he has been 
the senior author for many years of the ACE, 
which yields two part scores, one quantitative 
and one linguistic. Likewise, the College En- 
trance Examination Board has published many 
editions of its Mathematics Aptitude and Ver- 
bal Aptitude Tests. Regardless of the factorial 
purity of these several measures, it appears 
that in the testing procedures of the ACE and 
the CEEB two separate abilities or congeries 
of abilities are recognized as being (a) rela- 
tively independent if not completely so, and 
(4) of paramount importance to one’s academ- 
ic success in college. It might also be men- 
tioned that the Army General Classification 
Test of World War II sampled these same 
two abilities, linguistic and quantitative; in 
addition, a spatial factor was presumably meas- 
ured by items devoted to the counting of 
blocks. In sum, then, it may be said that test 
construction has placed the stamp of approval 
on the independence or relative independence 
of two large sectors of one’s intellectual abil- 
ities, the efficiency of thinking in terms of 
quantities and the efficiency in dealing with 
less precise, vaguer, frequently affectively 
toned verbal constructs. 

F. L. Wells [4], his outlook apparently col- 
ored by the tradition of the ACE and the 
CEEB in separating tests of the quantitative 
and of the verbal, reports on a revision of the 
Army Alplia of World War I into four sub- 
tests measuring quantitative, four measuring 
verbal, abilities. A later article [3] shows that 
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he feels these two intellectual variables to be 
quite significantly related to personality attri- 
butes. He makes a number of points in this 
second article which are germane to the pres- 
ent study. He seems to feel that a deficiency in 
quantitative ability is the result of a basic at- 
titude—an individual’s being repelled by the 
invariance and rigidity of number, being at- 
tracted to the elasticity and connotative nu- 
ances of words. In concrete terms, the verbal- 
ist prefers “He is a tall man,” to “He is 1.91 
meters in height.’’ The latter statement is too 
limiting, too exact, too obtrusive. 


Wells feels that verbalism (as he calls it) 
is no more an entity than is schizophrenia, but 
is, instead, more of a temperamental charac- 
teristic than an intellectual one. There should 
be little difference between the verbalist and 
the one who thinks quantitatively, so far as 
general adjustment is concerned; perhaps, he 
continues, the verbalist might, theoretically, be 
more in accord with the norms of cultural 
prestige, which place a premium on nicety of 
diction in speaking and writing. In a footnote 
he takes one exception to this putative expec- 
tancy: Those who major in creative work in 
the several Arts have less well-integrated per- 
sonalities. In general, one should expect an ap- 
proximate zero relationship between the quan- 
titative-verbalist tendencies and general ad- 
justment. Finally, Wells quotes, seemingly 
with complete approval, a statement by Adler 
to the effect that the quantitatively oriented 
individual is more self-sufficient, while the ver- 
bal one tends to seek security in social rela- 
tions. “This,” Wells continues [3, p. 72], “ties 
in with the more social tendency observed gen- 
erally in the ‘Verbal Facility’ group, as well 
as a special need for ‘belonging’ in this par- 
ticular man.” In the same paper, Wells gives 
two case studies. For both cases, verbal ability, 
as measured by tests, was considerably higher 
than quantitative. As case studies they are in- 
teresting; and the discussion relative to the 
cases serves as a yeasty matrix for research on 
personality correlates of Q—L differentials. 

In 1946, Munroe published a paper [2] on 
Rorschach differences found at Sarah Law- 
rence between 40 women with relatively high 
quantitative abilities (Q five or more percent- 
age points higher than Z on the ACE), in con- 
trast with 40 women whose L was consider- 
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ably higher than Q—44 percentage points or 
more. Munroe’s findings were, briefly, these: 
(a) the response total (R) of the two groups 
was approximately the same; (4) form was 
used much more frequently by the Q women; 
(c) the L women made more extensive use of 
M, or human movement; (d) there was more 
poor form among the L group; (¢) when the 
Munroe check list of adjustment was used, no 
significant group differences appeared—adjust- 
ment factors for the Q—L groups were rough- 
ly the same. Munroe’s generalization of her 
findings was to the effect that the L women 
showed a more subjective approach, the Q 
women were “more bound to a rather literal 
construction of objective reality.” As nearly 
as can be ascertained from her data, the most 
discriminating single item was the use of M/ on 
Card IV; only two of the 40 Q women pro- 
jected human movement on this card while 
“over half’’ of the L women made such a pro- 
jection. 


Method 


The present study was confined to freshmen 
women since they heavily outnumbered the 
men among the entering students at Santa Bar- 
bara College in September, 1948. Through the 
Registrar’s office, data were available on the 
MMPI and ACE for 200 of these freshmen 
women. The raw Q scores earned by these 200 
women on the ACE were converted to stand- 
ard scores with a mean of 50, a sigma of ten; 
the same procedure was followed for the L 
score. Subsequently, the total group was brok- 
en up into quarters (Q) of 50 each. O4 repre- 
sented those women with the greatest excess 
of Q over L. Q3 and Q2 represented the mid- 
dle ranges of Q—L disparity; while in the Q/ 
group one found the 50 women with lowest 
Q score in relation to their L. The Yes re- 
sponses to each individual item composing the 
group form of the MMPI were tabulated for 
these four equal groups. The resulting tabu- 
lations were inspected for gross differences be- 
tween the Yes responses for the highest and 
lowest quarters in Q—L discrepancy. An item 
was retained only if the Yes answers of the 
intermediate groups, Q3 and Q2, deviated in 
the same direction as they did for the 9Q4—Q7 
groups. This precaution seemed necessary be- 
cause the probable low reliability of individual 
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Table 1 


The Significance of the Differences Between the Answers of 100 High Q and 100 High L 
College Women to Certain Categorized Items on the Group Form of the 
Minnesota Multiphasic Personality Inventory 








Categories and MMPI Questions 








Sexual Inversion or Immaturity 


I have never been in love with anyone. 
I am very strongly attracted by members of my own sex. 


Usually I would prefer to work with women. 
I never attend a sexy show if I can avoid it. 
I believe women ought to have as much sexual freedom as men. 


Religiose or Rose-Colored-Glasses Attitudes 
I do not always tell the truth. 
In school I was sometimes sent to the principal for cutting up. 
I have no patience with people who believe that there is only one 


At times I have been so entertained by the cleverness of a crook 
that I have hoped he would get by with it. 

I have been inspired to a program of life based on duty which I 
have since carefully followed. 

I feel sure that there is only one true religion. 

I believe there is a Devil and a Hell in afterlife. 

I feel that it is certainly best to keep my mouth shut when I’m in 


Antipathy Toward the Verbal 
I like to read about history. 
I like to read newspaper editorials. 
I would like to be a journalist. 
I was a slow learner in school. 
In school I found it very hard to talk before the class. 
I like to read about science. 


Obsessive-Compulsive Tendencies 
When I leave home I do not worry about whether the door is locked 


I do not worry about catching diseases. 
I am not afraid of picking up a disease or germs from doorknobs. 
have never felt better in my life than I do now. 


Anxiety Syndrome 
feel anxiety about something or someone almost all the time. 
find it hard to keep my mind on a task or job. 
often feel as if things were not real. 
shrink from facing a crisis or difficulty. 
often must sleep over a matter before I decide what to do. 


Social Sensitivity 
am apt to pass up something I want to do because others feel that 
am not going about it in the right way. 
have a daydream life about which I do not tell other people. 
think that I feel more intensely than most people do. 
wish I were not so shy. 


Item num- t Marking 
ber, Gp. value character- 
MMPI of diff. istic of 
Q women 
324 3.60 False 
69 3.50 True 
208 2.86 False I like to flirt. 
435 2.17 True 
548 1.86 True 
101 1.40 True 
45 3.43 False 
118 2.67 False 
491 2.60 False 
true religion. 
277 2.40 False 
232 2.00 True 
373 1.86 True 
249 1.71 True 
26 1.67 False 
trouble. 
546 3.71 False 
428 2.86 False 
204 2.71 False 
260 2.60 True 
304 2.00 True 
552 1.7 False 
270 4.14 False 
and the windows closed. 
131 3.00 False 
524 2.80 False 
160 2.00 False I 
337 2.60 True I 
328 2.40 True I 
345 2.20 True I 
549 1.84 True I 
402 1.66 True I 
443 2.33 True I 
I 
511 2.33 False I 
299 1.83 False I 
201 1.57 False I 
64 1.43 False 


I sometimes keep on at a thing until others lose their patience with 
me. 
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Table 1 (Continued ) 
Item num- t Marking 
ber, Gp. value character- 
MMPI of diff. istic of Categories and MMPI Questions 
QO women 
Vocational and Avocational Attitudes (M-F Tendencies) 
283 1.86 True If I were a reporter I would very much like to report sporting 
news. 
538 1.84 True I think I would like the work of a dressmaker. 
423 1.79 True i like or have liked fishing very much. 
Resentful Attitudes 
237 2.29 False My relatives are nearly all in sympathy with me. 
406 1.98 True I have often met people who were supposed to be experts who wert 
no better than IL. 
250 1.80 True I don’t blame anyone for trying to grab everything he can get in 
this world. 
319 1.67 True Most people inwardly dislike putting themselves out to help other 
people. 
Miscellaneous 
212 2.60 False My people treat me more like a child than a grown-up 
513 1.86 False 


i think Lincoln was greater than Washington. 





T-F items in the MMPI would otherwise ef- 
fect the inclusion of a number of items of no 
validity whatever, purely by chance, since there 
are over 500 items in the total test. Forty- 
three items met this rule-of-thumb type of devi- 
ation for the respective quarters, Q4—Q/, and 
Q3—Q2. The linear r between the Q—L dis- 
parities of the 200 women on the ACE and 
the resultant score from the MMPI was .63, 
a figure which is, of course, spuriously high, 
owing to the manner of selecting the items. 


Results 


The 43 items are given in Table 1, along 
with the ¢ value of the difference between the 
percentage of Q4 + Q3 women who answered 
Yes to the item, in contrast with the percent- 
age of Yes answers by the Q2 + Q1 women. 
Also given in Table 1 is the group MMPI 
number for each of the 43 items, the true-or- 
false manner of marking characterizing the 
higher Q women, and, finaliy, an arbitrary 
categorizing of the items into groups with cer- 
tain type names. A discussion of the nine cate- 
gories follows: 

The first group consists of a cluster of six 
items, labeled “Sexual Inversion of Immatur- 
ity.” The Q women are found to admit attrac- 
tion to their own sex more frequently ; the dif- 


ference is significant at the .01 level of confi- 
dence. They deny liking to flirt, they prefer 
to work with women, and they avoid “sexy” 
shows if possible. The item with the highest ¢ 
value of all is, “I have never been in love with 
anyone.” The more frequent marking of the Q 
women to this item is False. Whether this 
means that the higher Q women have had as 
many heterosexual attachments as the L 
women and more with a homosexual tinge, it 
is impossible to say from the data ac hand. Fi- 
nally, the Q women believe that women should 
have more sexual freedom. This item has the 
lowest ¢ value and its interpretation is some- 
what dubious. The avoidance of the socially- 
toned sexual situations and a greater tendency 
to resort to like-sex attraction is apparent in 
this small cluster of items. 

The second group of items, eight in num- 
ber, “Religiose or Rose-Colored-Glasses Atti- 
tude,” has as its most valid item, “I do not 
always tell the truth.” Sixty-one per cent of 
the higher Q women marked this item False, 
in contrast with a 37% similar marking for 
the higher L women. This item with its falla- 
cious self-evaluation sets the tone for the seven 
which follow. The higher Q women were good 
little girls, less frequently being sent to the 
principal (they say) for “cutting up.” They 
are more certain there is only one true religion 
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and have more patience with others who be- 
lieve in the same way (no doubt the patience 
is greater if the religion in question coincides 
with their own). The cleverness of a crook 
does not amuse them; on the other hand they 
have been inspired to a program of duty which 
they carefully follow. They are more certain 
of a Devil and a Hell in afterlife. Finally, they 
deny a tendency to keep quiet when they are in 
trouble. The “goody-goody,” straight-laced, 
immature conventionality of the higher Q 
women is here sharply in evidence: They have 
a higher opinion of their personal veracity and 
goal strength, and they are much surer of their 
convictions in this age of relativistic thinking. 

The third set, ““Antipathy Toward the Ver- 
bal,” is the sharpest and most clear-cut of all 
the categories. What makes this group so pleas- 
ing to the researcher is that it might have been 
predicted in advance: The people with a rela- 
tively high Q, contrasted with their L, simply 
do not like to read. They would not like to be 
journalists. They like to read neither about 
history nor about science; reading newspaper 
editorials is also anathema. They find it hard 
to express themselves before a class; they ad- 
mit more frequently that they are poor learn- 
ers. Since the secondary-school curriculum is 
usually highly verbal, the statement about 
their being a “poor learner” may have a modi- 
cum of truth in it. However, it must not be 
forgotten that Santa Barbara College requires 
what amounts to a B average from high school 
for admission to its program; but it seems 
probable from some evidence to be introduced 
later in this paper that the higher Q women 
did have a somewhat lower secondary average 
than did the L women. This cluster of items 
would seem to make the hypothesis tenable that 
pleasure in reading is in part a function of the 
direction and deviation of the Q—L functions: 
As the Q increases, relative to the L, the 
amount of reading-for-fun should drop. If one 
may coin a phrase, further investigation should 
show the Q-higher-than-L person to be an “a- 
verbalist”—one who gets no pleasure from the 
printed page. 

The set of four items which follows, here 
called “Obsessive-Compulsive Tendencies,” 
was labeled thus after some hesitation since 
none of the items appears on the Pt scale of 
the MMPI. The first item, worry over lock- 
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ing the door and closing the window, has one 
of the highest ¢ values of the whole set of 43 
items, 4.14. It is followed by two “phobia” 
items relating to disease. It is possible that these 
two disease items, both significant at the .01 
level, give a partial clue to the shying-away- 
from-sex items found in the group first dis- 
cussed. Whether in early training, through the 
Logic of Nonverbal Relations (N. Cameron’s 
term) ‘“‘germs” were equated with “bad” sex- 
ual activity, is obviously not known; but it 
serves as an interesting conjecture. The fourth, 
and final, item of this set was included after 
some hesitation; it could have gone into the 
next group just as well, or, for that matter, 
into the miscellaneous group. The malaise over 
health seemed, however, to go rather logically 
with the two disease items which preceded. 

The next set of items, five in number, is 
called “Anxiety Syndrome,” and includes such 
symptoms as generalized anxiety, inability to 
concentrate, shrinking from crises, inability to 
make decisions, and a feeling of unreality. The 
anxiety here clearly shown might possibly be 
called “free floating,”’ unless the clinician as- 
sumed that the anxiety was bound by disease 
processes and their putative linkage with fear 
of “unclean” sexual activity. Regardless of the 
meaning of these five items and the four which 
precede, it is all too clear that a somewhat 
dysphoric mood is more frequently character- 
istic of the high Q woman at Santa Barbara 
College when she is compared with her L sis- 
ter. 

Five items follow which are labeled “Social 
Sensitivity” and which put the Q women in a 
somewhat more favorable light than the two 
sets which preceded this one. The Q woman is 
more responsive to social pressure, e.g., she is 
apt to stop doing something if others disap- 
prove (two items, MMPI numbers 64 and 
443). She denies being shy and does not think 
of herself as feeling more intensely than other 
people do. She denies (.05 level) having a day- 
dream life which she does not share with oth- 
ers. This latter item, question 511 of the 
MMPI, would seem to be in agreement with 
Munroe’s finding [2] about the greater sub- 
jectivity of the L woman—and collaterally, 
her greater production of M, which, accord- 
ing to traditional Rorschach interpretation, is 
in part a measure of one’s fantasy potential. 


Personality Correlates of Q-L Variability 


Surprisingly, the greater degree of social re- 
sponsiveness, noted in this set of items, does 
not agree with Wells’s, (and Adler’s) opinions 
relative to the greater self-sufficiency of the Q 
individual. 


Frankly, the writer expected a number of 
items in the Mf scale of the MMPI would 
distinguish the Q and L women. His reason- 
ing: Q higher than L is more frequently a 
male than a female pattern, if both sexes are 
thrown together in a distribution; hence, it 
was felt, the women with “male” intellectual 
patterns should be more like males, in terms 
of attitudes. Such, however, did not prove to 
be the case. Three items show up in the set 
called “Vocational and Avocational Attitudes 
(M-F Tendencies),” but their paucity in 
number and their relatively low ¢ values sug- 
gest that this variable, if present, is present in 
a weak and modified degree. The Q women 
do admit liking to sh; and, though they do 
not like the idea of being a journalist (Cf 
Item 204 in “Antipathy Toward the Ver- 
bal’), they say more frequently than the L 
women that they would like to report sporting 
news. Both of these items may be regarded as 
subserving attitudes presumably belonging in 
the masculine sphere. But the third item shows 
that the Q women would also like the work of 
a dressmaker, a quite definitely feminine pur- 
suit. The writer’s belief is that none of the 
three items represents sexually toned attitudes 
at all; the more probable interpretation would 
be that the Q women prefer tc work with 
“things” where words are not necessary (as- 
suming in this instance that it is the sports and 
not the reporting which appeals to them) .? 


In definite opposition to the Pollyanna per- 
sonality facet so sharply delineated in the sec- 
ond set of items, “Religiose or Rose-Colored- 
Glasses Attitudes,” is the set of four items, 
“Resentful Attitudes.” The Q woman’s rela- 
tives are not in sympathy with her; many so- 
called experts are no wiser than she; people 


7A personal communication from Dr. Solomon 
Diamond [1], Los Angeles State College, reported 
that a factor analysis which included the Michigan 
Vocabulary Profile and the Q and L fractions of 
the ACE (male veterans, N 100) resulted in the Q 
of ACE and the Sports section of the Michigan 
Vocabulary showing up on the same factor. Wheth- 
er this Q-sports linkage for males has any relevance 
to the above MMPI item is not apparent. 
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dislike having to help others; and she doesn’t 
blame those who try to grab everything they 
can latch onto in this miserable world. “No 
frustration without aggression” is a popular 
catchword these days. The constricted Q 
woman who will not permit herself any hetero- 
sexual activity, even flirting, pays for her im- 
maturity by concern over diseases, generalized 
anxiety, and finally, in this set, by paranoid pro- 
jection. The verbal cord of the preceding sen- 
tence ties together several articles of disparate 
data into one bundle; whether the bundle 
would loosen under the jarring impact of fur- 
ther experimentation cannot be predicted. Suf- 
fice it to say that Q women have, beneath their 
Pollyanna-ish, conventional facade, consider- 
able anxiety and some prickly-pear attitudes. 


The final set, two “Miscellaneous” items, 
may be swiftly passed over. The Q women 
feel that their parents more frequently treat 
them as adults than is the case with the L 
women. This attitude is in somewhat puzzling 
contrast with the adverse feelings toward rela- 
tives noted in the set just preceding. Possibly, 
if one cannot hate his parents, he can hate one 
who stands closest by blood ties, one who, in 
this instance, stands in loco parentis. The Lin- 
coln-Washington item may or may not have 
any significance. Lincoln is frequently felt as 
being humane, warm, and wise; Washington 
is the “Father,” stern, just, righteous. It may 
be that the Q women need the strong, patriar- 
chal figure, just as they need the orthodox re- 
ligion, the Devil, and a Hell. And then one 
could be overgeneralizing a slender bit of data. 


The 43 items have now been discussed in a 
more-or-less piecemeal fashion. The question 
naturally arises in the mind of the statistically 
sophisticated reader: Are the items so labori- 
ously educed from MMPI really valid? Only 
a partial answer can at present be given. One 
hundred women who entered Santa Barbara 
College at the same time as the experimental 
200 were used for cross validation. Most of 
these 100 women were transfer students, main- 
ly at the junior level; as a consequence they 
were older, somewhat more mature intellec- 
tually, with a somewhat attenuated range, 
compared with the freshmen women, on the 
QO and L fractions of the ACE. Despite these 
limitations and the further attenuating factor 
of the probably low reliability of the “person- 
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ality” items, the linear r of the 43 items with 
the Q-—L differentials, again expressed in 
standard scores, was .250, just barely short of 
acceptance at the .01 level. From a statistical 
standpoint, it would seem that the 43 items 
have a modicum of validity—for college 
women of preponderantly old-line American 
stock, who enter liberal arts colleges, the en- 
trance to which is predicated upon an approxi- 
mate B average from high school. From a com- 
mon-sense standpoint, the items which show 
the relative aversion of the Q woman to the 
printed page would seem to confer a sort of 
“face” validity as well; but it must be remem- 
bered that the writer expected the Q women 
to be more masculine in their attitudes than 
they proved to be. 


One final statistical footnote will be entered. 
The honor-point ratios for the fall semester, 
1948-1949, of both sets of women, the 200 in 
the tryout group and the 100 in the validation 
group, were computed and converted into 
standard scores. The Q—L differentials for 
both groups were then correlated with the 
honor-point ratios. For the 200 women in the 
tryout group, the r was .108; for the 100 
women, the corresponding r was .179. Both r’s 
went in the expected direction, the high Q 
women making somewhat lower grade aver- 
ages than was true of the high L women. Since 
the college curriculum, like that in secondary 
schools, is primarily verbal, in contrast to the 
quantitative, this finding follows expectations. 
It may also help explain the more frequently 
given answer by the Q women: “I was a slow 
learner in school.” 


General Discussion of Findings 


The Q-higher-than-L college woman, when 
compared with her L-higher-than-Q sister, 
tends statistically to be a somewhat more anx- 
ious, straightlaced, conventional, dysphoric 
person who dislikes to read. One gathers the 
impression that she is immature, is still cling- 
ing to a childlike pattern. As a late adolescent, 
she does not question the sexual proscriptions 
she heard as a child; nor does she question the 
religious orthodoxy of the early home. She pays 
for her immaturity through a generalized, 
“free-floating” type of anxiety in which, pos- 
sibly, her fear of infection with “germs” is 
association with forbidden sexuality through 
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the operation, early in life, of the Logic of 
Nonverbal Relations, as Norman Cameron 
calls it. The Pollyanna attitude is not com- 
pletely pervasive, for certain resentful atti- 
tudes shine through. She likes to deal with 
“things” and activities ; contrariwise, the print- 
ed page is not liked. If the writer correctly 
senses what Munroe [2] meant in describing 
the Sarah Lawrence L woman as more subjec- 
tive, the Q woman as more form bound and 
literal, he would be inclined to agree. Com- 
pared with the Q, the L woman is freer, less 
hidebound, more adventurous, less anxious, 
more literate, and more mature. On the other 
hand, no confirmation is found in the present 
data for the belief of Adler, quoted by F. L. 
Wells [3, p. 72], that the Q person is more 
self-sufficient, the L person more socially 
oriented. 


If one assumes that subsequent research will 
confirm these findings, he may feel constrained 
to make a tentative attempt at an explanation 
of the differences here noted. One could be- 
come unreservedly Freudian, assume a more 
frequent early cross-parental fixation on the 
part of the Q women, a fixation associated with 
the introjection of the literal, prosaic, Q-dom- 
inated concern with things-as-they-are—which 
the prevailing stereotype associates with the 
masculine pattern. The Q women with such a 
fixation retain these “masculine” attitudes ; but 
in the development of the super-ego the fixa- 
tion on the father subsides with the comple- 
mentary result that the proscriptions and ta- 
boos relating to the forsaken sexual object are 
more fully incorporated than is customary. 
This hypertrophy of the super-ego is paid for 
through anxious, dysphoric, resentful attitudes. 

The writer would prefer not to accept such 
a wordy set of constructs if a reasonable sim- 
pler explanation is at hand. Whatever the basic 
cause of the Q-higher-than-L continuum, we 
do find certain women who are more at home 
in dealing with quantities than they are in 
dealing with the fuzzier overtones of verbal 
connotations. These women do not like to read 
about history, nor do they like to read about 
science ; in fact, they do not like to read. Could 
not the explanation be simply that they did 
not learn to read well during their early ele- 
mentary school days; and, though in their duti- 
ful way, through various social pressures, they 
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continue to climb the educational ladder, it is 
not done with pleasure? They read, yes, when 
they have to do so—their textbooks and li- 
brary assignments. But unlike the majority of 
the L girls, who grow up to explore with 
pleasure the exciting world of books, the Q 
girls do little reading, though accepting at face 
value their textbooks and Sunday School quar- 
terlies. Diverse and omnivorous reading might 
have divested the Q girls of their orthodoxy in 
a majority of instances—but most of them 
didn’t read what they were not forced to read, 
simply because it was drudgery. For we shy 
away from those things we do not do well. 


Some caveats should now be entered. It 
must be remembered that the population stud- 
ied encompasses a narrow range of talent and 
is far from being typical of college women, 
let alone the general population ; that many of 
the items on which the present discussion is 
based could have slipped by the coarse statis- 
tical mesh through the operation of chance fac- 
tors; that any of the items discussed is only 
slightly more true of one group than it is of 
the other; that a validation r of .25 accounts 
for only about 1/16 of the total variance in 
the Q-L dichotomy of the ACE on the part of 
the women here studied. But at least a begin- 
ning has been made, using techniques which 
may be replicated ; and the findings, meagre as 
they are, are expressed in terms of quantity. 
F. L. Wells feels that the Q-L variable is im- 
portant in personality structure; Munroe 
showed there were considerable differences in 
projections made on the Rorschach; and the 
present study, with the MMPI, has caught a 
few stray samples of what, for college women, 
may be personality correlates of the quantita- 
tive-verbal continuum. More than this cannot 
at present be said. 


Summary 


1. A 43-item scale was derived from the 
group MMPI after an analysis of its 567 
items, based on the standard score differentials 
of the Q and L subtest groups of the ACE. 
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The population consisted of 200 freshmen 
women entering Santa Barbara College in Sep- 
tember, 1948. 

2. The 43 MMPI items yielded an r of 
.63 with Q—L differentials on the ACE for 
the 200 women. The manner of obtaining the 
items would, of course, unduly maximize this 
r, which is spuriously high. 

3. For a validation group of 100 women 
(mainly upper-division students) at the same 
college, the r proved to be only .25. The dif- 
ference in age, maturity, and intellect of this 
validating group probably helped attenuate the 
observed relationship; and, finally, the prob- 
ably low reliability of the 43-item, true-false 
scale, and the doubtless equally low reliability 
of the Q—L differentials may have further de- 
creased the relationships found in the prelimi- 
nary group. 

4. The Q-higher-than © college women 
were characterized by conventional, inhibited, 
sexual and religious attitudes; by displeasure 
in reading; by an anxious, dysphoric cast; by 
certain resentful attitudes which were at vari- 
ance with other, more numerous Pollyanna 
tendencies; by a more dependent attitude on 
the opinions of others. 

5. A tentative hypothesis for explaining the 
present data is offered. The immaturity of the 
Q-higher-than-L college woman is possibly a 
function of the lack of diverse, catholic read- 
ing which would have called into question cer- 
tain of her prim, conventional attitudes. 
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A Psychometric Comparison of Achieving and 
Nonachieving College Students of High Ability’ 


Henry H. Morgan 


Wesleyan University 


This paper considers some measured inter- 
ests, personality traits, and motives of achiev- 
ing and nonachieving college students of high 
ability and investigates the relationships be- 
tween these measured personality variables 
and the students’ scholastic success. 


Much of the work investigating the rela- 
tion between nonintellectual factors and aca- 
demic achievement has been reviewed by Bo- 
row [1], Lord [17], and others [3, 7, 13, 20, 
23, 25, 28]. In general, the studies of non- 
intellectual factors and achievement have not 
yielded clearly consistent results. The lack of 
consistent results may be due, in part, to the 
variety of measuring instruments used in these 
studies, the different populations which have 
been tested, and the varying definitions used 
in establishing achiever and nonachiever 
groups. The study reported below appears to 
confirm some of the previous findings regard- 
ing personality variables related to achieve- 
ment. 


Procedure 


A group of male sophomore students of high 
scholastic aptitude was selected from the College 
of Science, Literature, and the Arts (SLA) of the 
University of Minnesota on the basis of their scores 
on the 1947 American Council on Education Psy- 
chological Examination (ACE) taken prior to their 
enrollment as freshmen in college. These students 
had obtained, on the ACE, a raw score of 136 or 
more which placed them at or above the 90th per- 
centile on Thurstone’s 1947 norm group of 34,658 
males who were freshmen in four-year colleges. 


1The material contained in this paper is ab- 
stracted from a Ph.D. thesis [20] submitted to the 
Graduate Faculty of the University of Minnesota. 
Microfilm copies of the thesis and of all data ana- 
lyzed may be obtained through University Micro- 
films, 313 N. First Street, Ann Arbor, Michigan. 
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A total of 132 men were thus selected, and a dis- 
tribution of the freshman year honor-point ratios 
(HPR) or grade averages was made. The distribu- 
tion of HPR’s was then divided into three parts, 
high range, middle range, and low range. Each of 
these parts had a range of approximately the same 
number of HPR units and contained approximately 
the same number of students. The three groups, A, 
B, and C, which were thus formed, are defined in 
Table 1. 


Table 1 

Division of High-Ability Students into Three 
Groups on the Basis of Honor 

Point Ratio* 


Honor Point Ratio 





Group Mediaiu Range Number in 
Value (inclusive) Each Group 
A 2.4 2.1-3.0 43 
B 1.7 1.3-2.0 49 
8 .3-1.2 40 
Total 1.7 .3-3.0 132 





*HPR computed according to formula: 
(3A + 2B+ C) + (A+ B+C+D-+ F) 


In order to obtain two clearly defined groups 
which could be characterized as achievers and non- 
achievers, only Group A, achievers, and Group C, 
nonachievers, were considered in this study. The 
achievers obtained honor grades while the non- 
achievers obtained grade averages below the mean 
HPR average of the freshman class in the college 
of SLA for the academic year 1949-50. 

The students originally selected in Group A and 
Group C numbered 83. At the time of testing 70 
of these students were available, 40 in Group A 
and 30 in Group C. These two groups did not dif- 
fer significantly in ACE scores or in age. The 
groups did differ significantly in percentile rank in 
high school. For the achievers (N39) the mean 
percentile was 95, SD 6.6; for the nonachievers 
(N = 30) the mean percentile was 73.2, SD 17.4; 
F value is 7.08, p < .01. 

The students had taken the Strong Vocational In- 
terest Blank, Revised, Form M, in 1949 prior to 
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their registration in college. Personal letters and 
phone calls arranged for further testing of the sub- 
jects in January 1951 during their sophomore year. 
At this time the subjects took a short six-picture 
version of the Thematic Apperception Test (TAT), 
the Minnesota Multiphasic Personality Inventory 
(MMPI), and answered a short series of semi- 
structured questions about themselves. Except in the 
case of the Strong Blank where results were avail- 
able on 40 achievers and 27 nonachievers, compari- 
sons cited are based on 40 achievers and 30 non- 
achievers. 


Results 


Analysis of the Strong Vocational Interest 
Blank. 

A pattern analysis of the Strong Blank was 
made employing criteria of pattern designation 
similar to those suggested by Darley [2, p. 17]. 
There were no differences between the achiever 
and nonachiever groups in the total number of 
primary, secondary, or tertiary patterns. How- 
ever, achievers and nonachievers differed sig- 
nificantly? in the types of interests represented 
by their patterns on the Strong Blank. Forty 
per cent of the achievers had primary or sec- 
ondary patterns in Group V (social-service oc- 
cupations) while only 11 per cent of the non- 
achievers had such patterns in this group of 
occupations. Likewise, considering primary, 
secondary, or tertiary patterns, more achievers 
than nonachievers had interest patterns in 
Group V (60 vs. 26 per cent). On the other 
hand, significantly more nonachievers than 
achievers had patterns in Group VIII (busi- 
ness detail) (59 vs. 38 per cent) and Group 
IX (sales-contact) (59 vs. 35 per cent). 

Comparisons of achievers and nonachievers 


2Percentage differences which are referred to as 
significant have been tested using a critical-ratio 
technique and have yielded a CR significant at or 
beyond the .05 level. 
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on three “nonoccupational” keys of the Strong 
Blank appear in Table 2. 

Strong [24, p. 265] states that there is no 
clear evidence of the role of interest maturity 
(IM) in achievement in college, but he thinks 
that, if intelligence were held constant, a posi- 
tive relationship would be observed between 
achievement and IM scores. The difference of 
4.7 standard-score points between achievers 
and nonachievers would seem to confirm 
Strong’s hypothesis. Translated into maturity 
in terms of chronological age by the use of a 
table presented by Strong [24, p. 263], non- 
achievers appear to have had an average IM 
score close to the average IM score of 17-year- 
olds while the achievers obtained an average 
IM score equal to the mean score of 24- or 25- 
year-olds. Both groups averaged less than 18 
years of age when they took the Strong Blank. 
Darley [2, p. 60] suggests that a high score 
on the IM scale characterizes a well-organ- 
ized, socially sensitive, generally mature, tol- 
erant, and insightful individual. 

Although a positive relationship between 
achievement and the Occupational Level 
(OL) scale has been demonstrated in college 
by Ostrom [22], Kendall [16], and others 
[24, p. 201], the OL scores shown in Table 
2 do not show such a relationship, both achiev- 
er and nonachiever means being between three 
and four standard-score points above the popu- 
lation mean of 50. The OL scale has been in- 
terpreted as a measure of drive [22] or moti- 
vation [2, p. 60], but it is probably equally a 
reflection of socioeconomic level. Although 
data on subjects’ socioeconomic level are not 
available in the present study, the hypothesis 
is advanced that the high OL scores of achiev- 
ers and nonachievers may reflect the relatively 


Table 2 
Standard Scores of Achievers and Nonachievers on Three Nonoccupational Scales of the 
Strong Vocational Interest Blank: Interest Maturity (IM), Occupational 
Level (OL), and Masculinity-Femininity (MF) 














Achievers Nonachievers 
Nonoccupational N40 N=27 Mean F Prob.» t Prob., 
Scales Mean SD Mean SD Diff. Value Value 
IM 53.5 5.1 48.5 6.6 4.7 1.65 >.10 3.250 <.01 
OL 53.8 7.0 53.2 6.3 6 1.19 >.10 356 >.10 
MF 49.1 10.2 50.4 8.0 1.3 1.63 >.10 551 >.10 
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high socioeconomic status of high-ability stu- 
dents in a liberal arts college. 

Judgments were made by two psycholo- 
gists* as to the congruency of measured inter- 
ests and interests implied by each subject’s ed- 
ucational or vocational choice made as a fresh- 
man and again as a sophomore. 

Judgments placed each individual in one of 
four categories: congruent, cannot say, not 
congruent, or no choice indicated. The cate- 
gorizing was done on the basis of a joint deci- 
sion by the two judges who had first rated the 
subjects independently. Contrary to expecta- 
tion, more achievers than nonachievers had 
chosen educational goals which were not con- 
gruent with their measured interests. About 
one-third of the achievers had measured inter- 
ests which did not conform to the interests im- 
plied by their educational choice while about 
one-sixth of the nonachievers were judged not 
congruent. 


Analysis of the Minnesota Multiphasic Per- 
sonality Inventory (MMPI). 


Five types of analysis were used with the 
MMPI: (a) clinical scales, (4) special or ex- 
perimental scales, (c) profile sorting, (d) pat- 
tern analysis, and (¢) code comparison. 


Clinical scales. Comparisons of the mean 
scores of achievers and nonachievers on the 12 


8The two judges were Donna D. Morgan and the 
writer. 
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validity and clinical scales of the MMFI were 
made and the results were essentially negative. 
On the MMPI clinical scales both groups 
scored close to the college averages reported by 


Hoyt [15] and Gilliland [5]. 


Special and experimentai scales. Compari- 
sons of the mean scores of achievers and non- 
achievers were also made on the following 12 


special MMPI scales. 


Sie Social introversion-extroversion [4] 

Do Dominance [9] 

Sp Social participation [11] 

Re Social Responsibility [10] 

Pr Prejudice [12] 

St Social Status [6] 

Ds Dissimulation [8] 

Ac Achievement [7] 

Iq Intellectual efficiency [8] 

Psy Psychology [8] 

IPAR Institute of Personality Assessment and 
Research [8] 

A Manifest anxiety [26] 


Table 3 shows that achievers scored signifi- 
cantly higher than nonachievers on the special 
scales of Dominance (Do), Social Responsi- 
bility (Re), and Intellectual Efficiency (/q). 
On each of these scales the mean scores of both 
groups, while differing significantly from each 
other, were above the mean scores of the origi- 
nal norm groups for these scales. The Do scale, 
on which the achievers were higher than the 
nonachievers, implies such characteristics as op- 
timism, persuasiveness, self-discipline, and reso- 
luteness [27]. Characteristics measured by the 


Table 3 


Comparison of Raw Scores of Achievers and Nonachievers on 12 Special or 
Experimental Scales of the MMPI 











Special] or Achievers Nonachievers Mean F Prob.» t Prob., No.of 
Experimental N= 40 N= 30 Diff. Value Value Scale 
Scale, MMPI Mean SD Mean SD Items 

Sie 23.5 8.1 23.4 6.9 1 1.35 >.10 .054 >.10 70 
Do 18.8 3.0 16.5 2.9 2.3 1.05 >10 3.214 $=<01 28 
Sp 23.2 3.6 22.5 4.1 7 1.31 >.10 -749 >.10 32 
Re 23.6 3.0 21.1 3.9 2.5 1.80 <.10 2.985 <.01 32 
Pr 6.6 3.7 8.3 3.9 1.7 1.12 >.10 1.821 <.10 32 
St 23.0 3.4 22.6 3.2 4 1.14 >.10 493 >.10 34 
Ds 8.4 4.0 9.1 3.7 off 1.16 >.10 .732 >.10 74 
Ac 16.8 2.3 16.3 2.7 2 1.33 >.10 819 >.10 34 
Iq 51.1 3.2 48.6 4.0 2.5 1.62 >.10 2.860 <.01 59 
Psy 24.1 5.2 23.1 4.0 1.0 1.67 >.10 -857 >.10 38 
IPAR 13.1 2.4 12.8 1.3 a 1.68 >.10 -569 >.10 18 
A 10.8 6.1 11.4 6.3 6 1.08 >.10 .399 >.10 50 
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Re scale are said to be dependability, trust- 
worthiness, and a sense of obligation to the 
group [10]. The Jq scale is considered to 
reflect self-confidence, energy, realistic atti- 
tudes, and insight [27]. Several achievement 
scales, Ac, Psy, and IPAR, yielded no differ- 
ences between the achievers and nonachievers. 
These negative results on the achievement 
scales may be due, in part, to the fact that 4c 
was derived from a high-school population 
and Psy and JPAR from a graduate-school 
population. 

Profile sorting. Three clinical psychologists* 
were asked to 70 unidentified MMPI 
profiles into achiever and nonachiever cate- 
gories. Each judge sorted the profiles twice, 
once without knowledge of the exact number 
of achievers and of nonachievers and again with 
the knowledge that there were 40 achievers 
and 30 nonachievers. 

All the sortings were better than 50 per cent 
accurate although only one was significantly 
better than chance (64.3 per cent correct, 7? 
= 5.83, p <.01).° One of the judges sorted 
20 selected profiles, of which he was most cer- 
tain, into achiever and nonachiever categories 
with 75 per cent accuracy (using Yates’s cor- 
rection, ¥? = 3.23, p <.05).° 

Pattern analysis. Each MMPI profile was 
coded according to the system suggested by 
Hathaway [14]. Sortings of these coded pro- 
files were made considering eight MMPI 
scales: Hypochondriasis (Hs), Depression 
(D), Hysteria (Hy), Psychopathic deviate 
(Pd), Paranoia (Pa), Psychasthenia (Pt), 
Schizophrenia (Sc), and Hypomania (Ma). 
In these sortings, significantly more nonachiev- 
ers than achievers had one of their two highest 
points on the Pd scale (35 vs. 13.8 per cent) 
and more nonachievers than achievers had one 
of their two low points on the Pa scale (25 vs. 
7.5 per cent). These differences may indicate 
that more nonachievers than achievers are 
somewhat insensitive, callous, self-centered, or 
irresponsible. However, in considering the ab- 
solute magnitude of scores on the Pd scale, as 
in a comparison of group means or a count of 


sort 


*The writer wishes to thank Drs. Harrison G. 
Gough, Starke R. Hathaway, and Paul E. Meehl 
who made this part of the study possible. 

5A one-tailed test of significance appears appro- 
priate here [19]. 
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Pd scores of 55 or over, there are no significant 
differences between the achievers and non- 
achievers. The greater percentage of non- 
achievers with profile elevations on Pd is part- 
ly a reflection of an absence of higher scores 
on the neurotic scales which were more often 
obtained by the achievers. 

Code comparisons. A weight was assigned 
to each MMPI T' score depending upon its 
magnitude. Scores of 70 or more received a 
weight of 3; scores of 69 through 55 a weight 
of 2; scores of 54 through 46 a weight of 1; 
and scores of 45 or less a weight of zero. Code- 
comparison indexes were computed between 
randomly paired profiles. The absolute differ- 
ences between individual scale weights were 
summed. Indexes thus represent the sum of 
weight differences between two profile codes. 

Three comparisons were made using ran- 
domly paired profile codes. A distribution was 
made of indexes computed from 50 pairs of 
achiever profile codes ; another distribution was 
made of indexes from 50 pairs of nonachiever 
profile codes; and a third distribution was 
made of indexes computed from 50 achiever 
profile codes paired with 50 nonachiever pro- 
file codes. The mean values and standard de- 
viations of these three distributions of code 
comparison indexes were approximately equal 
indicating that there was great heterogeneity 
of profile codes within these three groups. 

A frequency count was made for the four 
possible code positions of each MMPI scale 
for achiever and nonachiever groups. From 
this frequency count a “trend code” was con- 
structed for the achiever group by choosing for 
each MMPI scale the code position which 
yielded the largest percentage difference in 
favor of the achievers vs. the nonachievers. A 
similar trend code was constructed which re- 
flected the largest percentage differences in 
favor of the nonachievers vs. the achievers. 

The individual achiever and nonachiever pro- 
file codes were then sorted by computing code 
comparison indexes for each achiever and non- 
achiever profile code paired first with the 
achiever trend code and then with the non- 
achiever trend code. Each profile code was 
thus automatically categorized as achiever or 
nonachiever on the basis of these code compari- 
son indexes computed by pairings with the 
trend codes, a profile code being called achiever 





if 
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if it yielded a lower comparison index when 
paired with the achiever trend code and being 
called nonachiever if it yielded a lower index 
when paired with the nonachiever trend code. 
This sorting was 70 per cent correct in cate- 
gorizing achiever and nonachiever MMPI 
profiles; 7? = 10.17, p <.01 — .001)° 


The Thematic Apperception Test (TAT). 

A short version’ of the TAT developed by 
McClelland and others [18] was used to 
measure achievement motivation. McClel- 
land’s scoring system ‘“‘C’’ was used to score 
the stories of the 70 subjects for Murray’s 
need for achievement (n Ach) [21, p. 144, p. 
164]. 

The stories of 20 individuals chosen at ran- 
dom from the total group of 70 subjects were 
scored by two scorers in addition to the writer, 
and product-moment correlation coefficients 
were computed as a measure of inter-scorer re- 
liability. These coefficients indicated fairly 
substantial agreement between scorers (r’s 


®The trend codes derived are unreliable since 
some of the per cent differences in code position 
frequencies between achiever and nonachiever 
groups are small and not statistically significant 
and also because minor changes in T-score values 
may shift MMPI codes considerably. To some ex- 
tent the trend codes are a statistical artifact, and 
they should be viewed cautiously until cross-vali- 
dated. Each code was as much an alienation code 
for one group as it was a trend code for the other 
23'17—; Nonachievers, 9'48-136. 


group. The eight-scale codes used were: Achievers, 


Six pictures were projected on a screen from 
2” x 2” slides in the following order: 


(B) Two men at a machine 

(H) Boy with a book at a desk 

(A) Father and son (TAT, 7BM) 

(G) Boy and operation scene (TAT, 8BM) 
(E) Lawyer’s office, two men conversing 


Morgan 


were .54, .56, .66; per cent agreements on 
basic scoring categories were 75, 78, 81). 

Table 4 shows that the achiever group 
scored significantly higher than the non- 
achiever group in achievement motivation as 
measured by the TAT. The distribution of 
n Ach scores of achievers appeared approxi- 
mately normal while the nonachiever n Ach 
score distribution was bimodal with one cluster 
above a score of 18 and another cluster below 
a score of 11. Twenty per cent of the achievers 
scored at or below the nonachiever median of 
12 while 37 per cent of the nonachievers scored 
above the achiever median of 18. 


Semistructured Questions. 

Each subject made three responses to each 
of the following three questions: 

A. Who are you? If you could make just three 
statements to describe who you are what 
might they be? 

B. If you could be granted any three wishes what 
might you wish? 

C. What are your three greatest fears? 

The responses of the subjects were scored 
for n Ach using the criteria employed in scor- 
ing the TAT protocols. A response was given 
a score of +1 for n Ach if it met at least one 
of the criteria for any of the TAT n Ach 
scoring categories given by McClelland [18]. 
Table 4 shows that achievers scored signifi- 
cantly higher than nonachievers on achieve- 
ment motivation as reflected in responses to 
questions A, B, and C. 

The same responses to questions A, B, and 
C were also scored for a variable designated as 





(D) Older man handing papers to younger man 
seated at desk 
8The writer wishes to thank Drs. James J. Jen- 
kins and Wallace A. Russell for their part in this 
study. 


Table 4 
Comparison of Achievers and Nonachievers on Need for Achievement (n Ach) Scores 
Derived from TAT Stories, on Need for Achievement (n Ach) Scores Derived 
from Responses to Questions A, B, and C, and On Other-Centeredness 








(OC) Scores Derived from Responses to Questions A, B, and C 














Achievers Nonachievers 
Variable N= 40 N= 30 Mean F Prob. p t Prob. ,* 
Mean SD Mean SD Diff. Value Value 
n Ach (TAT) 17.80 7.11 1347 831 4.33 138 «2>10)0 2313 <.02 
n Ach (A,B,C) 2.25 1.22 1.60 1.02 65 1.42 >.10 2.330 <.02 
OC (A,B,C) 1.35 1.30 63 1.12 le 1.31 >.10 2.398 <.01 





*One-tailed t test. 
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“other-centeredness” (OC). It was hypoth- 
esized that the differences noted on Group V 
of the Strong Blank might be reflected in re- 
sponses to semistructured questions. Response 
statements which expressed nonegocentric or 
other-centered, “Group V” attitudes were 
scored +1 and summed for each individual. 
Table 4 shows that achievers scored signifi- 
cantly higher than nonachievers on OC indi- 
cating concern for or awareness of others. 


Summary 


A group of male sophomore students of high 
scholastic aptitude was selected from the Col- 
lege of Science, Literature, and the Arts 
(SLA) of the University of Minnesota. These 
students, chosen on the basis of high scores on 
the ACE, were divided into groups according 
to honor point ratio earned during their fresh- 
man year in college. One group contained 
students who earned more than a B average 
during their freshman year; another contained 
students who earned grades below the average 
of all SLA freshmen. These groups were des- 
ignated achievers and nonachievers respective- 
ly. 

Psychometric comparisons of achievers and 
nonachievers were made on the Strong Vo- 
cational Interest Blank, the Minnesota Mul- 
tiphasic Personality Inventory (MMPI), the 
Thematic Apperception Test (TAT), and a 
series of semistructured questions. 

Achievers and nonachievers did not differ 
significantly in variety of well-developed in- 
terests as manifested in interest patterns on 
the Strong Blank. However, achievers and 
nonachievers were found to differ signifi- 
cantly in the types of interests indicated by 
patterns on the Strong Blank. Significantly 
more achievers than nonachievers had interests 
typical of persons in social service or welfare 
occupations (Group V) while more non- 
achievers than achievers had interests typical 
of persons in business detail occupations 
(Group VIII) and business sales occupations 
(Group IX). 

On Strong’s Blank, achievers scored signifi- 
cantly higher than nonachievers on a scale of 
Interest Maturity (IM). Achievers and non- 
achievers in this sample did not differ signifi- 
cantly from each other in Occupational Level 
(OL) scores, nor did the two groups differ in 
femininity of interests indicated by Strong’s 
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Masculinity-femininity (MF) scale. 

The measured interests of more achievers 
than nonachievers were found not to conform 
to the interests implied by their vocational or 
educational choices. 

Several special or experimental MMPI 
scales yielded significant differences in com- 
parisons of mean scores of the achiever and 
nonachiever groups. Achievers scored higher 
than nonachievers on an MMPI scale of 
Dominance (Do) considered to reflect, along 
with dominance or ascendancy in a social situ- 
ation, such characteristics as optimism and 
persuasiveness. Achievers also scored higher 
than nonachievers on a scale of Social Respon- 
sibility (Re) which is believed to reflect depend- 
ability, integrity, and seriousness. Achievers 
likewise scored higher than nonachievers on a 
scale of Intellectual Efficiency (/¢) implying 
efficiency, energy, self-confidence, and insight- 
ful, realistic attitudes. A number of other 
special scales, including some designed to pre- 
dict academic achievement, did not yield signifi- 
cant differences between the two groups. 

Three clinical psychologists twice sorted un- 
identified MMPI profiles into achiever and 
nonachiever categories with better than 50 per 
cent accuracy. However, only one sorting was 
significantly better than chance. One judge 
also sorted 20 selected MMPI profiles of 
which he was most certain into achiever and 
nonachiever categories with 75 per cent ac- 
curacy. 

Analysis of the MMPI indicated that more 
nonachievers than achievers had profile eleva- 
tions on the Psychopathic deviate (Pd) scale 
and profile low points on the scale of Para- 
noia (Pa). Such differences may indicate that 
more nonachievers than achievers are some- 
what callous, socially insensitive, irresponsible, 
and self-centered individuals. 

A hypothetical MMPI code was construct- 
ed for the achiever group and another for the 
nonachiever group. Although these codes may 
be unreliable, they made possible automatic 
sorting of profiles with 70 per cent accuracy. 
Both groups appeared heterogeneous in MMPI 
profile patterns. For the most part, the 
MMPI clinical scales, considered either in- 
dividually or as a profile pattern, did not show 
a clear relationship to scholastic achievement 
in this study. For a small group of students 
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certain MMPI profile patterns appear related 
to achievement. 

Achievers were found to score significantly 
higher than nonachievers in achievement moti- 
vation or need for achievement (n Ach) ex- 
pressed in stories written in response to pic- 
tures making up a short version of the The- 
matic Apperception Test (TAT). 

Similarly, achievers scored higher than non- 
achievers in achievement motivation (n Ach) 
expressed in response to several semistructured 
personal questions. The achievers also scored 
higher than the nonachievers wheu these same 
responses were scored for other-centeredness 
(OC) indicating concern for or awareness of 
other persons. 


Conclusions 
This study has indicated several nonin- 
tellectual factors or personality variables 
which appear related positively to the academ- 
ic achievement of high-ability college students. 
1. Maturity and seriousness of interests. 


2. Awareness of and concern for other per- 


sons. 
A sense of responsibility. 
4. Dominance, persuasiveness, and self-con- 
fidence. 
5. Motivation to achieve, or the need for 
achievement. 
Received January 14, 1952. 
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A New Test of “Validity” for the Group MMPI 


Robert Buechley 


Department of Public Welfare, Territory of Alaska 


and Harry Ball 


University of Hawaii 


One of the outstanding characteristics of 
the Minnesota Multiphasic Personality Inven- 
tory (MMPI) are the scales which indicate 
whether or not a respondent has made an hon- 
est effort, and has understood what he is an- 
swering. These three scales, the ?, L and F 
scales, have been called “validity” tests. 


The “Validity” Scales of the MMPI 


The ? scale simply denotes the clinical scales 
as unusable if the respondent fails to answer 
too many items. The L scale detects those per- 
sons who have probably lied in order to make 
themselves appear as extremely exemplary per- 
sons. 

The logic behind the F scale is of a statis- 
tical nature. A high score is obtained by an- 
swering a large number of items contrary to 
the answer given by over 90 per cent of the 
norm group. This accumulation of rare an- 
swers, many of which have a content which 
might be regarded as bizarre, is intended to 
detect those persons who are deliberately falsi- 
fying or who are responding in a random fash- 
ion. It is of no consequence to the scoring 
whether the unusual or random answers be 
caused by a lack of interest, by a lack of un- 
derstanding, by a deliberate intent not to co- 
operate in taking the test, or by clinical factors. 

The last possibility is of particular impor- 
tance, for the F scale is highly related to the 
Schizophrenia or Sc scale. We would expect 


1At the time this research was carried out both 
authors were on the staff of the University of Min- 
nesota. The sample used in the study was made 
available by Mr. Arthur E. Prell, director of the 
University of Minnesota Penology Research Fund, 
established by the Graduate School. Permission to 
publish these data was obtained from the Califor- 
nia Youth Authority and the Fred C. Nelles School. 
The testing was conducted by R. B. Van Vorst, 
Senior Psychologist of the School. 


such a correlation in view of the bizarre con- 
tent of some of the F-scale items, which re- 
sults in some of the items being in both scales. 
It has, in fact, been suggested that where the 
Sc score is high and there is additional evi- 
dence for a diagnosis of schizophrenia, a high 
F score might be interpreted as further sub- 
stantiation of the diagnosis. But in many cases 
in which the group MMPI is employed, only 
the most superficial additional data on the re- 
spondents are available. Here there is no basis 
on which to separate the effects of schizo- 
phrenia from those of random responses. 
Furthermore, the F scale in no sense solves 
the problem of random responses when all 566 
items of the booklet are used. The F scale is 
based entirely on the first 300 items, the front 
side of the IBM answer sheet. Much random- 
ness due to boredom is of a progressive nature. 
The importance of the next 266 items, the 
back side of the IBM answer sheet, is made 
obvious when we note how many items from 
the clinical scales are included there. These 
questions include 35 per cent of Paranoia 
items, 48 per cent of the Psychasthenia items, 
40 per cent of the Schizophrenia items, and 30 
per cent of the Suppressor Variable or X items. 
Since this last is used as a corrective device for 
the Hypochondriasis, Psychopathic Deviant, 
Psychasthenia, Schizophrenia, and Hypomania 
scales, we see that the answers to the last 266 
items determine to a large extent the scores on 
six of the nine clinical scales plus the K scale. 
Only the Depression, Hysteria, and Mascu- 
linity-Femininity Interest scales are unaffected. 
The new scale reported in this paper was 
developed in an effort to solve both these prob- 
lems: (a) as a supplement to the F scale in 
detecting those persons making random re- 
sponses, and (4) as a guide in determining 
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which answer sheets may be regarded as valid 
efforts when both the F and the Sc scores are 
high.? 


Description of the Tr Scale 


This test, called the Test-retest or Tr scale, 
makes use of the sixteen items in the MMPI 
booklet form which are exact repetitions of 
sixteen previous items.* It merely compares the 
two responses to these identical items. The 
number of contradictory responses constitutes 
the score. 

There is little danger of the random an- 
swerer becoming aware of the duplicate items 
because of the distance which separates most 
of the pairs. Two paired items are separated 
by 60 and 44 intervening items, the other four- 
teen by at least 260 items. Furthermore, the 
percentage responses for the norm groups in- 
dicate that for only two items* are there strong 
possibilities that the contradictory responses 
might be due to ambivalence about the re- 
sponse. 


An Illustration of the Tr Scale 


A single illustration may serve to demon- 
strate how the Tr offers at least a partial so- 
lution to both the problems mentioned above. 
The cases employed are a sample of 137 of 
the inmates of the Fred C. Nelles School for 
Boys, Whittier, Calif. They may be described 
generally as uncooperative, poorly motivated 
adolescents with a placement of at least the 
sixth grade. They are used here because they 
present a wide distribution on the F scale. 

Their scores on both the F and Tr scales 
are presented in Table 1. The coefficient of 
correlation between them is +.63. This offers 
empirical evidence that the F scale does detect 


2Since its inventors called the F scale a test of 
“validity,” we have done likewise. We are aware 
that it measures the validity of the respondents’ 
efforts and understanding rather than the more 
technical factors usually associated with tests of 
validity. 

’The paired items from the booklet are the fol- 
lowing: 8-318, 15-314, 16-315, 20-310, 21-308, 22- 
326, 24-333, 32-328, 33-323, 35-331, 37-302, 38- 
311, 305-366, 317-362, 23-388, and 13-290. The last 
two listed were omitted from this research [1]. 

*These are items 15-314 and 317-362. In the first 
instance the norm groups split almost evenly in 
their reply, while in the second about 22 per cent 
of the norm groups answer “cannot say”[4]. 
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Table 1 
Frequencies of Raw Scores on the F and Tr Scales 
of the MMPI 


F Scale 





Tr O- 4+ 8 12- 16- 20 24& #£«Total 
Scale 3 7 |) ie oe eee 

10 1 1 
9 1 1 2 
8 1 1 1 3 
7 2 1 1 2 6 
6 2 1 3 6 
5 1 2 3 4 1 3 14 
4 1 2 5 3 1 + 16 
3 1 11 4 3 1 1 21 
2 1 5 2 2 2 2 3 17 
1 8 6 6 2 2 2 2 27 
0  & 3 1 2 1 24 
21 16 10 21 137 


Total 14 26 29 





random answering, but it also indicates that 
the Tr scale has an independent function. 

A tentative cutting point for the Tr was 
established. We assumed that a comparison of 
the first and second responses constituted a 
test-retest, and measured the degree of rela- 
tionship between the two. Because the prob- 
lem involved the correlation of dichotomized 
true-false variables, the phi coefficient was 
used [3, p. 92]. For each Tr score the co- 
efficient of reliability was the maximum pos- 
sible, assuming that half the first responses 
were false and half true. The results are pre- 
sented in Table 2. Only fourteen paired items 
were used here for reasons which will be dis- 


Table 2 


Calculated Maximum Phi-coefficients 


Score d ¢? 





0 1.00 1.00 
1 86 7 

2 .74 55 
3 66 ay 
4 52 .27 
5 41 17 
6 .28 08 
7 -00 00 





‘There are certain dangers in assuming this even 
division. Thirteen of the sixteen items, and eleven 
of the fourteen used here, are answered false by a 
substantial majority of the norm groups. Thus a 
person answering the last half of the test randomly 
with an extreme bias toward false might escape 
detection. However, such extreme patterns can 


usually be discovered by inspection. 
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cussed when the scoring procedure is pre- 
sented. 

The major problem was to determine where 
to establish the cutting point between perfect 
reliability and zero reliability. The score of 3 
was selected, because at this point phi-squared 
fell below 0.50. 

In analyzing the functions of the test the 
customary interpretation of the F scale was 
used: to consider F scores of less than 8 as 
valid, those between 8 and 11 as borderline, 
and those 12 and over as invalid. The F scores 
in Table 1 have been subdivided to fit these 
classifications. °® 

Considering the first problem, how the Tr 
supplements the F scale in detecting random 
responses, we find that the T'r scale rejected 
50 per cent of the total cases. The F scale 
classed 29 per cent as valid, 21 per cent bor- 
derline, and 50 per cent invalid. The two 
scales agreed in rejecting 35 per cent and in 
accepting 27 per cent. About 2 per cent of the 
cases are accepted by the F but rejected by the 
Tr. These are presumably clear instances of 
the F scale failing to detect random responses. 
In addition, 62 per cent of the borderline 
group were eliminated by the Tr. Here we 
lose 13 per cent of the cases but salvage 8 per 
cent. Thus the Tr serves both as a further 
screen on those denoted as valid by the F and 
as a basis for a division of the borderline cases. 


The value of the Tr scale is demonstrated 
even more clearly when we consider the in- 
valid F cases. About 15 per cent of the cases 
were indicated as invalid by the F scale but 
as valid by the T’r. The Tr score thus provides 
a concrete basis for separating subjects whose 
high F score resulted from random responses 
from those whose responses may be validly and 
consistently bizarre. It remains to detect those 
in the high F group who achieved their score 
by a systematic and conscientious effort to dis- 
tort their scores. 


The Scoring Procedure 


The major problem in the development of 
the Tr scale concerned the method of scoring 


®The most recent literature on the MMPI estab- 
lishes the cutting point for invalidity at 18, a T 
score of 80. As the results of this research show, 
such a decision only increases the value of the Tr 
scale [2, p. 23]. 
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when the IBM answer sheet is used. With the 
Hankes answer sheet there are no particular 
problems, for all the answers are on the same 
side of the page. In this case it is also possible 
to use all sixteen of the identical pairs. 

With the IBM sheets thirteen of the pairs 
are split, the item appearing first on the 
front side and then on the back side of the 
page. The problem of scoring is solved by fold- 
ing the right side of the upright answer sheet 
so that the edges of the cutaways just cover 
the dotted spaces for answering false to items 
31 to 60. It is then possible to observe all but 
two of the paired items simultaneously, hence 
the reason for the use of only fourteen of the 
pairs. 

A stencil may be made to facilitate the scor- 
ing. Merely cut the holes so as to reveal the 
same answer, true or false, for all 28 items on 
the folded sheet. The paired items may be con- 
nected with lines, denoted by the same letter 
of the alphabet, or both. With this method the 
Tr scale can be scored as quickly as the other 
scales of the group MMPI.’ 


Summary 

The validity scales of the group MMPI 
have two weaknesses: the F scale does not de- 
tect random answers on the back page of the 
IBM answer sheet, and it does not separate 
examinees who get high scores from random 
responses from those who may hold certain de- 
lusions. Both these problems are partially 
solved by the Tr scale which discovers incon- 
sistencies of response to the duplicated items 
in the booklet form. 
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A Study of the Validity of the Index of 
Adjustment and Values’ 


Glen E. Roberts 


University of Kentucky 


The validation of any test, especially in the 
area of personality, involves the problem of the 
selection of suitable criteria. At the present, 
the two most widely used standards have been 
correlations with other personality tests and 
clinical reports based on case studies. In an 
effort to circumvent the obvious inadequacies 
of these methods, another criterion, data from 
an experimental technique, was used in this 
study. 

The personality test investigated in this 
study was the Index of Adjustment and Val- 
ues designed by Bills, Vance, and McLean 
[1] to obtain measures of the self-concept, the 
attitude toward self, and the concept of the 
ideal self. The authors hold that emotionality 
is involved in those traits in which there is a 
discrepancy between the concept of self and the 
concept of the ideal self, and this discrepancy 
is defined as personal maladjustment. Further- 
more, the authors hold that the attitude which 
a person has toward himself is also an emo- 
tional area if the person rejects himself in his 
present condition. 

Since the Index of Adjustment and Values 
consists of a list of forty-nine trait words, it is 
readily adaptable to the technique of free asso- 
ciation. Therefore, it should be possible to com- 
pare emotionality as indicated by the self- 
ratings on the Index with emotionality as in- 
dicated by a free-association test. 

If the assumption that emotionality is indi- 
cated by certain self-ratings on the Index is 
valid, then reaction time as the primary indi- 
cator of emotionality in free association should 
verify the hypothesis by serving to separate cer- 
tain types of responses on the Index. In this 


1This article is based on a thesis submitted to the 
University of Kentucky in partial fulfillment of the 
requirements for the degree of Master of Arts [2]. 
The author wishes to express his gratitude to Dr. 
Robert E. Bills with whose assistance this work 
was undertaken. 


study it was hypothesized that (a) there 
would be no difference in reaction time be- 
tween high and low ratings on the concept of 
self, (6) longer reaction times would occur on 
those traits wherein a person rejects himself, 
(c) longer reaction times would occur on 
those words in which there was a discrepancy 
between the concept of self and the concept of 
the ideal self. 


Design 
Lower-division psychology classes at the 
University of Kentucky were tested with the 
Index of Adjustment and Values as a group 
test. From these classes, fifty female sopho- 
more and freshman students ranging in age 
from eighteen to twenty years were obtained 
as volunteers for a psychological experiment, 
the nature of which was not disclosed to the 
subjects. The subjects thus constituted a ho- 
mogeneous group in respect to age, sex, and 
educational level. 

Each subject was given the forty-nine traits 
of the Index in a free-association test in which 
a chronoscope and voice key, accurate to a 
hundredth of a second, were used to obtain re- 
action times. When the experimenter gave the 
stimulus word, the chronoscope automatically 
started, stopping when the subject spoke the 
response word. 

In the experimental situation, the subject 
was oriented to the nature of the procedure by 
an explanation and demonstration of the 
chronoscope and voice key. As an introductory 
exercise, four stimulus words were presented 
which served to acquaint the subject further 
with the procedure and to test comprehension 
of the directions. 

The list of the forty-nine personality traits 
was presented after the introductory exercise 
and after comprehension of the procedure by 
the subject was assured. In order to control 
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the effects of fatigue, position of the stimulus 
words, and practice effects, the word list order 
was randomized with a different order for each 
subject. As each word was presented to the 
subject, the experimenter recorded the reaction 
time and the associative response as well as any 
overt signs of emotionality. 


Data 


The reaction times were used to test the hy- 
potheses stated previously. The data pertinent 
to each of these hypotheses are discussed in or- 
der below. 

The trait words were first dichotomized ac- 
cording to high or low ratings on the con- 
cept of self. Ratings of one and two comprised 
one group, and ratings of four and five com- 
prised the other group.2 Words which were 
given a rating of three were excluded from 
consideration since it was assumed that they 
were of neutral emotionality for the subjects. 
Reaction times for words in both of the groups 
were averaged, and inspection of these data re- 
veals that the means of the two groups of aver- 
ages were found to be approximately equal. A 
summary of these data is included in Table 1. 


Table 1 


Significance of Reaction Time Differences Between 
Groups of Words 








Mean 
Diff. 


Measure Ratings o Diff. t P 





Concept of Low*-Hight 


Self 01 _ 
Acceptance Reject.*- 
of Self Accept.t .10 .031 3.23 .01 
Discrepancy Discrep.- 
No Discrep. .15 .027 5.55 .001 


*Low and rejection ratings include those items given a 
rating of 1 or 2. 

tHigh and aeceptance ratings include those items given 
a rating of 4 or 5. 





In the second comparison, the words were 
divided according to the degree of acceptance 
of self. One group, the rejection ratings, con- 
tained all words in which the subject gave a 
rating of one or two, and the other group, the 
acceptance ratings, included all words in which 


*Ratings for the negative traits were reversed in 
order to make ratings for these traits comparable 
to those for positive traits. 
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the subject gave a rating of four or five. 
Ratings of three were again excluded from con- 
sideration because of the assumed neutrality 
indicated by this rating. Reaction times were 
averaged in both groups for each subject, and 
the ¢ test was employed to determine whether 
or not the difference of the means of the two 
groups of averages was significant. The data 
yielded a ¢ of 3.23, which shows that at the .01 
level of confidence the two groups of ratings 
were significantly different in reaction time. 
Significantly longer reaction times occurred on 
those traits wherein the subject rejected him- 
self or gave a low rating of acceptance of self. 
These data are included in Table 1. 

In the third comparison, the trait words for 
each subject were dichotomized according to 
the discrepancy score which is the difference 
between the concept of self and the concept of 
the ideal self. Words which showed no dis- 
crepancy between the concept of self and the 
concept of the ideal self were placed in one 
group, and words which showed a discrepancy 
between the two ratings were placed in the 
other group. Reaction times for each group were 
averaged for each subject, and the difference 
of the means of the resulting distributions of 
averages were compared by use of the ¢ test. 
This manipulation gave a ¢t of 5.55, which may 
be interpreted at the .001 level of confidence 
to mean that the obtained mean difference dif- 
fers significantly from zero and that the null 
hypothesis may be rejected. The table includes 
a summary of these data. Thus, it is to be ob- 
served that significantly longer reaction times 
occurred on those words for which the subjects 
indicated discrepancy between the concept of 
self and the concept of the ideal self than on 
those words where no discrepancy was given. 

An attempt was made to analyze further the 
above three comparisons in terms of blocking 
and of word associations. Blocking was con- 
sidered to have occurred when the subject 
could give no response or responded only after 
an extremely long reaction time. After the asso- 
Ciations were classified according to the cate- 
gories as proposed by Woodworth [3], the oc- 
currences of emotional or personal associations 
were compared with the occurrences of defini- 
tions, completions or predications, and coordi- 
nates. The data show no differences between 
the groups in terms of the occurrence of either 
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blocking or emotional associations. 

The assumption has already been made that 
the words appearing in the Index were non- 
emotional or neutral for all subjects. Further- 
more, the question arose that perhaps it would 
be more difficult to associate to some of the 
words because of lesser familiarity or of less 
general usage of those words. To answer this 
question and to test the foregoing assumption, 
the average reaction time and the total number 
of blocks were computed for each word. None 
of the average times for any one word deviated 
as much as one standard deviation from the 
mean of all subjects on all words. Since the 
number of blocks on each word ranged from 
one to six, it was obvious by inspection that a 
large number of blocks did not occur on any 
of the words. Therefore, it was concluded that 
although there was some variation in the diffi- 
culty of association or the emotionality for all 
subjects on the words, this difficulty or emo- 
tionality did not differ significantly from word 
to word. 


Discussion 


The data of this experiment indicate three 
major findings. The first conclusion which may 
be drawn is that ratings on acceptance of self 
may be considered as indices of emotionality. 
This conclusion is substantiated by the finding 
which indicated that significantly longer re- 
action times occurred on those words upon 
which the subject gave a low rating of ac- 
ceptance of self. Rejection of self in the pres- 
ent condition, indicated by a low rating, was 
hypothesized earlier in the paper to be an emo- 
tional rating. 

The second conclusion which may be made 
from the data is that a discrepancy between the 
concept of self and the concept of the ideal 
self may also be considered an index of emo- 
tionality. In the experiment, significantly long- 
er reaction times occurred on those words in 
which a discrepancy was indicated. The dis- 
crepancy between the concept of self and the 
concept of the ideal self had been defined as 
maladjustment and the ratings were hypothe- 
sized to be indicators of emotionality for the 
subject. 

The data likewise show that the ratings 
on the concept of self may not be considered as 
indices of emotionality. This conclusion is 


drawn from the observation that reaction times 
were approximately equal in both the groups. 
It may be concluded, furthermore, that the 
concept of self by itself is not an index of emo- 
tionality unless the subject gives a low rating 
on acceptance of self or unless he indicates a 
discrepancy between the concept of self and the 
concept of the ideal self. This conclusion ap- 
pears logical since emotionality should not nec- 
essarily be involved for a subject simply be- 
cause he does not consider himself to be high 
in a particular trait. 

No evidence of the emotionality of self- 
ratings was obtained either from examining 
blocks of no response or from analysis of the 
type of word associations. It would appear 
that since these measures did not serve as evi- 
dences of emotionality, measurement of the 
emotionality of emotionally-neutral words 
must be undertaken by precise, objective tech- 
niques. 

Summary 

This study was an investigation of the vali- 
dity of the Index of Adjustment and Values. 
Measures of emotionality as indicated by this 
Index were compared with measures of emo- 
tionality as obtained from a free-association 
test. The subjects constituted a homogeneous 
group in respect to age, sex, and educational 
level. 

The results indicate that the self-ratings of 
the Index are valid indices of emotionality. Re- 
action time was significantly longer for trait 
words on which the subjects indicated dis- 
crepancy between concept of self and concept 
of the ideal self. A significantly longer reac- 
tion time was also found for words in which 
the subjects disclosed a rejection of self in 
their present condition. In addition, the re- 
sults reveal that the concept of self is not an 
index of emotionality unless a rejection or dis- 
crepancy is indicated upon the same person- 
ality trait. 

Received December 12, 1951. 
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Self-Evaluation Compared with Group Evaluations: 


Wilse B. Webb 


Washington University 


How does an individual’s rating of himself 
compare with the ratings of him by a group of 
his close associates? The closeness or disparity 
of such ratings are of considerable importance. 
Practically, understanding and modifying 
these differences would lead to smoother social 
adjustments for the individual. Theoretically, 
the relationship between an individual’s idea 
of himself and others’ ideas of the individual 
has become an increasingly important problem 
in the areas of personality development and 
mental health [1]. In common sense terms, 
do we see ourselves as others see us? 

In performing a study of in-group attitudes, 
data were obtained which allowed a compar- 
ison of an individual’s rating of himself with 
the ratings of him by a group. The method 
used was similar to the method used by Sears 
in his study of projection [3]. 


Procedure 


The traits on which the ratings were ob- 
tained were “Personal Charm,” “Intelli- 
gence,” “Security,” “Jewish Appearance,” and 
“Jewish Acceptance.” 


The variables were defined on each rating sheet 
as follows: 

1. Personal Charm: Likeable, good “personality,” 
mixes well with others. 

2. Security: Self-assurance, sure of himself, free 
from anxiety, doesn’t become easily disturbed, shows 
little doubt, calm in most situations, not a worrier. 

3. Intelligence: A “bright” guy, creative, insight- 
ful, imaginative, understanding, original, ability to 
theet any situation successfully and to guide ac- 
tions toward desired goals. 

4. Jewish Appearance: Superficial appearance, 
facial and bodily expressiveness, manner of speak- 
ing and gesturing, “looks and acts Jewish.” 

5. Jewish. Acceptance: Ideals, believes in Jew- 


1The author was assisted by Mr. Maurice Zem- 
lick in the collection and statistical analysis of the 
data. 


ish tradition and heritage, accepts Jewish idealism 
either theoretically or in practice. Thinks of him- 
self as a Jew (in all respects), feels strong afhlia- 


tion with his Jewish brethren, prefers Jewish asso- 
ciations over all others. 


For each trait a rating form 1 to 7 was pos- 
sible with the extremes labeled as “least” and 
“‘most.” 


The rating group was 31 members of a 
Jewish fraternity at Washington University. 
On a meeting night of the fraternity, the 
author and an assistant obtained permission to 
talk to the group. A brief informal talk was 
given. The following points were emphasized : 
(a) They were being asked to take part in a 
research project. (6) The project was of con- 
cern to them. (c) They were to rate the mem- 
bers of their fraternity on the five variables 
described above. (d) They were to rate only 
those persons whom they felt they knew well 
enough to rate. (e¢) They were to rate them- 
selves on these traits. These ratings were to 
be as they felt themselves to be, not as they 
anticipated the rating of others would be. (f) 
The individual data would be kept anonymous 
but the general results would be reported to 
them. The anonymity of the rating was em- 
phasized particularly. 

Forty-five rating scales were passed out to 
each person. Each scale had the name of a 
member of the fraternity on it. One scale, of 
course, contained each man’s own name. The 
subjects then dispersed themselves and turned 
in their ratings when they had finished. No 
time limit was fixed. 


Results and Discussion 


Correlations between the average rating 
given an individual by the group and the 
rating of the individual on the five variables 
were obtained. The smallest N from which an 
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average was obtained was 12. The largest N 
was 30. The median “rating” N was 22. For 
the 31 cases on which self-ratings were avail- 
able the obtained correlations were as follows: 
(a) Personal Charm, .434; (4) Security, .220; 
(c) Intelligence, .602; (d) Jewish Appear- 
ance, .321; (e) Jewish Acceptance, .419. 

What are the meanings of these data? It is 
assumed that the ratings were given seriously 
by both the group and the individual. If this 
assumption is correct, the group rating re- 
presents a valid description of how an individ- 
ual appears to a group of individuals who are 
well acquainted with him on the variables con- 
sidered. Admissibly, each single rating is prob- 
ably warped by unique experiences, distorted 
by certain halos. It must be recalled, however, 
that each group rating is a combination of 12 
or more ratings in which unique factors are 
minimized and the “halos,” where operating, 
would be basic to the individual’s performance. 
The ratings are, in other terms, the role as- 
signed an individual by his group. 

An individual’s rating of himself is some- 
what more complex. It is at a minimum, a 
complex of his estimate of himself on these 
traits, how he would like to appear, and how 
he suspects he will be rated. Although we 
attempted to minimize the effort to approxi- 
mate the group rating and to maximize candid 
personal ratings, the factor was probably oper- 
ative. 


If these meanings may be attached to the 
measures obtained, our data indicate that a 
person’s notion of himself is considerably in- 
congruent with the role assigned him by his 
group. This incongruity is probably greater 
than indicated by the correlations obtained 
since the personal rating contained, to some 
extent, an attempt to approximate the rating 
that the group would assign in addition to his 
own independent estimates of himself. 

More detailed analyses of the nature of self- 
rating and group-rating differences were per- 
formed. The direction of the differences be- 
tween the individual ratings and the group 
ratings was determined for the total group. 
The percentages of overevaluations, i. e., the 
percentage of individual’s ratings of them- 
selves being higher than the group-assigned 
ratings on each trait, were as follows: Per- 
sonal Charm, 61 per cent; Security, 46 per 
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cent; Intelligence, 79 per cent; Jewish Ap- 
pearance, 33 per cent; and Jewish Faith, 51 
per cent, 

lt was noted that the tendency toward over- 
evaluation was related to the general accept- 
ance of a trait by the group. General accept- 
ance was estimated by obtaining the mean 
self-rating on each trait under the assumption 
that if a trait was personally desirable or un- 
desirable to the group, this would be reflected 
in the individuals’ rating themselves nearer the 
“most” or “least’’ ends of the continuum on 
each trait. The self-rating means for Personal 
Charm, Security, Intelligence, Jewish Appear- 
ance, and Jewish Faith were 4.8, 4.7, 5.3, 2.4, 
and 4.2 respectively. It may be seen that the 
extreme high and low overevaluations were re- 
lated to the extreme high and low means (In- 
telligence and Jewish Appearance). 

The consistency with which a given individ- 
ual over- or underevaluated himself on more 
than one trait was of interest. Contrary to the 
results of Sears [3], our data revealed individ- 
ual tendencies toward a “lack of insight’ in 
the direction of overevaluations on a complex 
of traits. Sears used the following criterion of 
insight: “A subject was said to possess insight 
if he rated himself in the same half of the dis- 
tribution as others rated him, and to lack in- 
sight if he rated himself in the other half” 
[3, p. 155]. Sears then checked the tendency 
of individuals to lack insight on more than one 
trait. He found little tendency for subjects to 
lack insight consistently on a number of traits 
and concluded that, “It may safely be con- 
cluded that the tendency to lack insight is 
largely specific to any given trait and is not a 
general personal trait which shows its influ- 
ence on judgments made on all traits” [3, p. 
160]. 

We used a more rigid criterion of insight 
on the three traits in which positive and nega- 
tive direction could be clearly defined: Intelli- 
gence, Security, and Personal Charm. An in- 
dividual was given a plus, if he rated him- 
self below the group rating. If no constant 
“bias” existed from trait to trait the individual 
should have an equal chance of rating himself 
above or below the group rating of each trait. 
Under this assumption, the distribution of 
pluses and minuses on the three traits should, 
for the individuals, be distributed binomially. 
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The theoretical distribution of the pluses and 
minuses under these assumptions were obtain- 
ed and compared with the actual distribution 
of pluses and minuses for the individuals of 
our sample. These results appear in Table 1. 

It may be noted that a significant chi square 
was obtained when the distributions were com- 
pared, and the major source of differences 
were in the individuals who rated themselves 
over the group rating on all three traits 
(“three pluses”). This difference in the Sears 
study and the present may be a function of 
either the more rigid “insight” measure or the 
nature of the traits involved. The Sears study 
was concerned with unacceptable traits such 
as “stinginess,” “obstinacy” etc. 


Table 1 


Theoretical and Obtained Distribution of Over and 
Under Ratings (Pluses and Minuses) on 
Three Traits 

















Two One 
pluses plus 
Three one two Three 
pluses minus minuses minuses 
Expected 3.9 11.6 11.6 3.9 
Obtained 10.0 10.0 8.0 3.0 
x? = 11.01. 
2% significance level — 9.83. 


Our data are obviously related to the recent 
work of Holt on self-evaluation [2]. Holt re- 
lated self-ratings to a criterion of “expert” 
ratings by a diagnostic council. The diagnostic 
council was presumably composed of psycholo- 
gists, psychiatrists, and other social scientists. 
The difference or similarity of the two ratings 
was used as a measure of individual insight. 
There is no argument with the technique nor 
the definitions used by Holt. Our criterion, 
however, differed from that of Holt. It is quite 
likely that the two criteria, although correlated, 
measure different levels of personality varia- 
bles. The criterion measurements of Holt are 
probably more closely related to the funda- 
mental make-up of the individual. Our cri- 
terion, group ratings, is probably more related 
to the socially operating individual in which 
the fundamental characteristics are variously 
“repressed,” “modified,” “sublimated,” etc., in 
the light of a person’s social surroundings. 
From these definitions, it is possible to sug- 
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gest at least three selves: (a) the objective 
self, defined in terms of his actual potenti- 
alities, capacities, and the like, i. e., the ob- 
jectively measured individual; (4) the self- 
concept, the individual’s understanding or 
evaluation of these potentialities and capaci- 
ties; (c) the social self, the operation of the 
potentialities and capacities in relation to his 
social environment. It is suggested that Holt’s 
comparisons were between selves a and b, and 
our comparisons are between selves 4 and c. 
It may be assumed that the three levels would 
be related, but disparities would be possible 
between all three levels. Unfortunately, our 
data were collected prior to Holt’s publication, 
and direct comparison of the self levels from 
our present data are not possible. 


Summary and Conclusions 

Self-ratings of individuals were compared 
with ratings of the individuals by a group in 
close contact with these individuals. The 
traits rated were “Personal Charm,” “Secur- 
ity,” “Intelligence,” “Jewish Appearance,” 
“Acceptance of Jewish Faith.” 

Low correlations were obtained between the 
mean group ratings with the individual 
ratings, indicating a considerable disparity be- 
tween the individual’s concept of himself and 
the group’s concept of the individual on the 
variables measured. Personal over- and under- 
evaluation was, as a group, related to the ac- 
ceptability of a particular trait—a consistent 
tendency for overevaluation was obtained. 

Certain differences between the present 
method and that of Holt were noted. It is 
suggested that three “selves,” at least, may be 
conceptualized: the objective self, the self- 
evaluated self (the self-concept), and the 
social self. Our data are related to the latter 
two, those of Holt to the former two. 


Received November 8, 1951. 
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A Factor-Analytic Study of Schizophrenic 
Symptoms 


Wilson H. Guertin 


Beatty Memorial Hospital 


W esiville, Indiana 


Psychiatric research has been notoriously 
fruitless, probably as a consequence of poor 
classification resulting from inadequate obser- 
vations and definitions. Psychiatric classifi- 
cation should be based upon careful empirical 
and quantitative observations of relationships 
between well-defined symptoms. After estab- 
lishing sound classificatory concepts which can 
serve as the independent variables, systematic 
research should result in theory construction. 
Today, poorly differentiated syndromes pro- 
posed many years ago are being examined in al- 
most every possible statistical way with all 
kinds of observational procedures. The results 
of such investigations have been almost always 
negative and merely confirm objections to the 
psychiatric classifications that were made over 
fifty years ago. This unfavorable view of the 
current status of psychiatric research points 
to the need for more useful conceptual views 
of symptoms. 


A concise statement of the aims of fac- 
tor analysis by Holzinger and Harmon sug- 
gests the possible application of this technique 
to the reduction in complexity of symptom 
data: 


Factor analysis is a branch of statistical theory 
concerned with the resolution of a set of descrip- 
tive variables in terms of a small number of cate- 
gories or factors. This resolution is accomplished 
by the analysis of the intercorrelations of the vari- 


1This study was submitted as a doctoral disserta- 
tion at Michigan State College in 1951, under the 
helpful direction of Albert I. Rabin, and with the 
technical advice kindly extended by Leo Katz. Par- 
ticular mention must be made of the cooperation 
of Dr. R. J. Graff, Superintendent of Galesburg 
State Research Hospital, who made possible the 
collection of data while the author was in his em- 
ploy. 
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ables. A_ satisfactory solution will yield factors 


which convey all the essential information of the 
original set of variables. The chief aim is thus to 
attain scientific parsimony or economy of descrip- 
tion [6, p. 3]. 


The application of factor analysis to symp- 
toms of mental illness is not a new approach. 
Dahlstrom [1], Eysenck [2], Moore [7], and 
Wittenborn [10] as well as others have stud- 
ied rather broad types of symptoms within the 
large domain of mental illness. The present 
study depends upon these previous ones for 
method, and investigates the more restricted 


domain of diagnosed schizophrenics. 


Subjects 


The sample incorporates one hundred diag- 
nosed schizophrenics admitted to a state hos- 
pital. It includes first admissions, readmis- 
sions, cases formerly hospitalized elsewhere, 
and returned escapees. The criteria for selec- 
tion were largely by exclusion since it would 
be circular to use schizophrenic symptoms as 
criteria in an empirical study of the symptoms. 
These bases for exclusion were: 


a. Neurological or anamnestic evidence of 
cerebral involvement, 

b. Aging which might result in the inclusion 
of cerebrovascular cases with prior good ad- 
justment, 

c. Menopause with prior good adjustment. 

d. Shock treatment less than two months 
before the interview, 

e. Mutism, poor English, or uncooperative- 
ness leading to poor communication (exclud- 


2The writer wishes to express his appreciation to 
Dr. A. Paul Bay, Superintendent of Manteno State 
Hospital, for facilitating the gathering of data at 
his hospital. 
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ing many catatonic stupor cases and some para- 
noids), and 

f. Cases that gave no positive indications of 
psychosis. 

The rate of admission to the hospital was 
higher for females, so that 61 females were 
used and 39 males. The age range was from 
16 to 60, with an approximate mean of 30 
years. Included were 22 negroes—13 male and 
9 female. The sample was very biased with re- 
spect to urban origin since the large majority 
of the cases were residents of the Chicago 
area. To obtain the desired 100 subjects that 
showed none of the criteria for exclusion, it 
was necessary to screen about 300 suspected 
schizophrenics. The distribution of the state 
hospital diagnoses among the schizophrenic 
subtypes were as follows: 58 mixed, 30 para- 
noid, 9 catatonic, 2 hebephrenic, and 1 simple. 
In order to evaluate the importance of the uni- 
que features of the sample with respect to ob- 
tained factors, loadings for case history varia- 
bles were found. The variables included were: 
age, sex, race, onset, previous hospitalization, 
living with spouse, and education. 


Procedure 


The symptoms studied were abstracted 
from discussions of schizophrenia in two 
standard psychiatric textbooks [5, 8]. A total 
of 77 relevant symptoms were used in the col- 
lection of data but the frequencies of occur- 
rence for some were such that only 52 could 
be retained in the final intercorrelation ma- 
trix. The symptoms were defined more exten- 
sively and more exactly [3] but are merely 
labeled for convenience in the body of this re- 
port. 

The subjects were rated for the presence or 
absence of these symptoms by the investigator 
on the basis of one or more mental status in- 
terviews. The questioning was determined 
broadly by the intention of rating these specific 
symptoms yet the interviews were kept as in- 
formal and friendly as possible. It was neces- 
sary to consult with one or more attendants 
for rating some of the symptoms. Through- 
out, an attempt was made to study the symp- 
toms at a fixed point in time so that a true 
Cross-sectional picture of the individual would 
be obtained. Possible biases in these ratings are 
discussed elsewhere [1, 3]. Consistency be- 





Factor Analytic Study of Schizophrenic Symptoms 309 


tween two raters was established through the 
method of agreement, showing an 84 per cent 
agreement with a standard error of only 2.0 
per cent. Factor loadings for the hospital diag- 
noses were calculated, but the intrinsic difh- 
culties of using diagnosis as a variable have 
been discussed in the original report [3]. 


Treatment of Data 


The information on the data sheets was 
transferred to McBee punch cards and tetra- 
choric correlation coefficients were graphically 
estimated from Thurstone’s tables. 

The multiple-group centroid method of 
factor analysis [9] was employed because it is 
time saving and is likely to produce empiri- 
cally meaningful factors. The clusters which 
determined these multiple-group factors were 
identified by a conventional method which 
would produce fairly tight and relatively in- 
dependent clusters. Estimates of communality 





Table 1 
Six-Factor Oblique Matrix for Symptoms 
c 
“ _ 
e 5&3 ¢ Es 23 
Symptoms eb ri P ¥E £6 95 
$= bt sa BE 82 SF 
ee ¢= 32 Ge ES Es 
Gin mt OS OF Aan AA 
Destructive 97 45 22 86 26 10 
Aggressi ve-combative 92 -.44 ~-.82 16 15 12 
Disagreeable- 
unpleasant 85 -.24 -.81 21 16 12 
Erratic activity level 83 -27 -25 12 .! .09 
Disturbed sleep & -08 -.26 .48 89 01 
Noncooperative -78 08 .20 .82 14 OT 
Bizarre delusions 44 -38 -.06 -.03 .38 05 
Cries at times 38 11 -.10 .87 .08 22 
Visual hallucinations 26 -26 .06 06 .O1 07 
Coneern over 
inventions 24 -.12 -.08 -.30 18 19 
Retarded movements -—29 92 .46 46 -.11 -.24 
Low activity level -29 .£87 .21 84 -.29 -.24 
Seclusive -—16 .78 12 41 -—.85 -.22 
No poverty of ideas 07 -.71 27 ©£68 580 .44 
Poor attention -.16 -70 16 49 -.83 12 
Flattening -14 66 .05 28 -21 48 
Poor appetite -21 .64 -80 15 -.07 -06 
Poor place orientation -.10 .49 .10 .§0 -38 .05 
Superior-grandiose 42 -48 -.34 -.33 .12 «14 
No narrowing 
of interests 36 -48 -—23 -—388 39 -.15 
Silly 09 -40 -838 -.12 26 -39 
Incoherent-irrelevant 16 26 002 08 -08 .09 
Poor recent memory 21 288 .06 41 -24 -.24 


Systematized delusions 04 -24 .06 -35 07 .06 
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Table 1 (Continued) 
ro) s 
Symptoms Es 3 a ¥ . £5 a 5 
33 52 se 22 fe G5 
s8 S32 as g2 58 58 
ac mE oS SF an aA 
Concern over right 

& wrong —.37 26 1.00 02 07 -20 
Guilt feelings -.29 47 7 -20 04 12 
Sex concern 10 .00 -74 12 87 -49 
Underweight 14 35 86.47 15 12 .33 
Over-religious .06 .00 88 -.15 -.02 33 
Heedless of needs 04 67 .05 .98 10 16 
Lack of social graces -58 19 -.06 -78 13 37 
Lack of ambition 04 46 .09 # .78 19 ll 
Perplexed St 48 2 S88 09 -.03 
Poor time orientation 35 86686 —iw4CiwH C02 ~—Sti«COS 
Auditory hallucinations 06 -.11 06 -.27 -ll -.03 
Delusions of persecution .09 -.18 383 -.11 .87 10 
Suspicious 13 -.21 -l2 -.09 81 22 
Delusions -28 -.81 54 -.10 .80 16 
Misinterpretations 40 -.40 .00 -.34 .73 -.03 
Somatic delusions -.06 -.20 13 12 .64 16 
Feelings of reference 14-11 04 02 8 .16 
Somatic concern -.06 -.10 29 -.07 -54 18 


Persistent inapprop. 
mood 22 -.19 ll -ll 82 31 


Blunting of ties -10 .15 .21 .33 -.26 22 
Perseveration of 

phrases .02 .08 .38 24 22 81 
Mood-environ. 

disharmony 04 -.29 -.07 ll -.14 74 
Preoccupied 08 -.20 .42 01 6: 20 
Mood-ideas disharmony -.05 -.28 -.21 -.09 -.26  .51 
Manneristic BRB 46 6 01 -.46 
Concern over politics 26 -.16 12 .07 10 40 
Does not work 07 86 16 2 14 «89 


Vague abstract terms —.11 05 17 -.2 


19 22 








were obtained by using the highest coefficient 
in each column. Residuals were determined 
after rotating the oblique factor matrix so that 
all factors were orthogonal to one another. 


Results 


Initial factoring showed that the extraction 
of too many factors (eight) accounted for 
more than the estimated communality. Reason- 
able results were obtained by extracting fewer 
factors. The resulting six-factor oblique ma- 
trix for symptoms is shown in Table 1. 

Too few hebephrenic and simple schizo- 
phrenics were encountered to permit calcu- 
lation of reliable intercorrelation coefficients 
and factor loadings for them. With the ex- 
ception of the factor loadings for the para- 
noid diagnosis of —.70 on psychomotor retar- 
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dation and withdrawal and .58 on persecuted- 
suspicious, the loadings probably are not sig- 
nificant. 

Oblique factor loadings for case-history 
variables indicated that few of the sample char- 
acteristics were important with respect to the 
obtained factors. Exceptions to this were 
found in the correlation of three variables with 
the psychomotor retardation and withdrawal 
factor: age,—.35; sex (male), .42; and not 
living with a spouse, .35. Justification for 
studying Negroes and whites together is seen 
in that the highest correlation of race with 
any of the factors was only .17. 

The total variance accounted for by the six 
orthogonal factors constitutes 57.5 per cent 
of the total variance of the intercorrelation 
matrix or 82.3 per cent of the estimated com- 
munality of these 52 symptom variables. 


Discussion of Results 


The high communalities found imply that 
specific factors determining symptoms are not 
of much importance. Hence, instead of single 
symptoms, clusters of symptoms (i. e., fac- 
tors) can be utilized to describe a schizophren- 
ic or a homogeneous group of schizophrenics. 

The location of reference axes is always 
equivocal. The correspondence between factors 
and psychiatric syndromes and constructs, as 
well as the application of cluster analysis to 
locate them, argues for the satisfactoriness of 
the positions of the reference axes. 

The sample of subjects appears to be some- 
what unusual, but the most crucial character- 
istic seems to be the fact that few symptoms of 
a “deteriorated” nature were present in this 
sample. Factor loadings on case-history varia- 
bles suggest that some generalization from 
this study to other groups of schizophrenics 
can be made. 

The six factors have been named induc- 
tively by considering the symptoms which 
showed the highest factor loadings: 

The excitement-hostility factor is character- 
ized by a heightened level of activity, and hos- 
tility and aggression. Hyperactivity may be re- 
lated in part to the motives stemming from the 
ideational content of bizarre delusions. A 
heavy loading for disturbed sleep implies that 
there is an actual disturbance in the activity 
level and that the factor cannot be character- 
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ized by hostility alone. Catatonic excitements 
probably are responsible for the symptoms 
characterizing this factor. 

The bipolar psychomotor retardation and 
withdrawal factor is characterized by its gen- 
eral lowering of all expressive activity on the 
positive end and by expansive outgoingness on 
the negative end. Symptom loadings suggest 
that it is a control factor (motoric) and not 
one which reflects basic disorganization of per- 
sonality. Catatonic stupors seem to lie on the 
positive end of the factor, while paranoids (es- 
pecially the grandiose type, characteristically 
overtalkative and expansive) are found at the 
negative end. 

The guilt conflict factor is related most 
closely to concern over right and wrong and 
sex concern. Delusions also plays an important 
part here and the delusions probably are ex- 
pressions of guilt over sexual matters. Loadings 
on case-history variables were inspected in the 
hope that this factor might indicate a benign 
condition which stems from adolescent turmoil 
rooted in sexual conflict. Although sexual con- 
flict appears to be important, it does not seem 
to be specifically one of adolescence. Possibly 
this factor represents a prolonged conflict be- 
ginning early but still unresolved in later life. 
It seems possible that further study of this fac- 
tor might have some etiological significance for 
at least some cases of schizophrenia. 

The persecuted-suspicious factor is essen- 
tially a delusional one. The heaviest loading 
of this factor is on delusions of persecution, the 
one symptom that is most pathognomonic of 
paranoid disorders. The diagnosis of paranoid 
schizophrenia had a loading of .58 for this 
factor. 

The personality-disorganization factor is 
characterized by a lack of intellectual control 
over affect and thought. Perseveration of 
phrases, mood-environment disharmony, and 
preoccupied have heavy loadings on this factor, 
The affective flattening of the catatonic stupor 
is not encountered here, but rather there is 
adequate affective show which is inappropri- 
ately expressed. The concept of hebephrenic 
silliness and this factor probably have a large 
overlap. Thought processes are somewhat im- 
paired but the symptom loadings imply a disso- 
ciation rather than “deterioration,” which is in 
contrast to the confused-withdrawal factor 


where intellectual functioning is impaired 
rather grossly. Since this factor was the one 
most closely associated with an insidious onset 
and bore a rather low but positive relation 
with age, one would suspect that hebephrenic 
schizophrenia is represented here. 

The confused-withdrawal factor showed the 
largest loadings on heedless of needs, lack of 
ambition, lack of social graces and poor time 
orientation. This factor correlates .49 with the 
psychomotor retardation and withdrawal tac- 
tor. It seems that the confused withdrawal be- 
havior of the simple schizophrenic underlies 
the production of this factor. It is hypothe- 
sized that the intellectual confusion might be 
primary in producing withdrawal features 
since the patient is intellectually and emotion- 
ally impaired to the extent that he cannot com- 
pete with others and maintain a social posi 
tion. If such is the case it would be in con- 
trast to the psychomotor retardation and with- 
drawal factor which seems to be an inhibition 
of action. 


Systematic Viewpoints 


Six factors underlying schizophrenic symp- 
tomatology have been derived from the experi- 
mental sample of subjects. How these factors 
are represented among the subjects is not dis- 
closed by this analysis. It is conceivable that 
only one of these factors might be seen in any 
individual or, on the other hand, many indi- 
viduals might show loadings on several factors. 
In practical terms one could speak of a hebe- 
phrenic schizophrenic rather than of merely 
hebephrenic features. 

An inverted factor analysis, a report of 
which is in press [4], confirms the view that 
several factors are required to describe most of 
the individuals in this study. 

It is felt that these factor-analytically de- 
rived concepts should be used without placing 
the restrictions upon them that are encoun- 
tered when they are considered in the Aristo- 
telian diagnostic frame of reference. Rather, 
they should be given the character of response 
variables and employed in research as such. 
Correlation with other response variables 
should result in a topographical map of the 
schizophrenic domain. Investigation of the re- 
lationship between test variables and these fac- 
tors should prove more fruitful than the con- 
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ventional correlation of test variables with di- 
agnosis. Wittenborn and Holzberg’s study 
[11] is illustrative of this approach which re- 
lates test variables and symptom-factors. Such 
systematic applications of factor-analytically 
derived concepts should prove to be of more 
value than employing them as principles of 
classification. 


Summary 


1. Psychiatric classification of cases as 
schizophrenia has been subject to much ques- 
tion. The subjective establishment of syn- 
dromes used in the past has made it desirable to 
verify such proposals by the more objective fac- 
tor analytic techniques. 

2. The present investigation studied the 
occurrence of 52 symptoms in a group of 100 
diagnosed schizophrenics. 

3. <A multiple-group centroid factor analy- 
sis disclosed six factors: Excitement-hostility, 
Psychomotor retardation and withdrawal, 
Guilt-conflict, Persecuted-suspicious, Personal- 
ity-disorganization, and the Confused-with- 
drawal factors. 

4. The systematic application of these fac- 
tor analytically derived concepts is discussed. 
The need is seen for an inverted factor analy- 
sis to demonstrate the practicality of describing 
individual schizophrenics in terms of predomi- 


nance of loading on a single symptom factor. 
Received December Da 1951. 
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Dementia Versus Mental Defect in Middle-Aged 





Housewives 


J. P. S. Robertson and H. Wibberley’ 


Netherne Hospital, Coulsdon, Surrey, England 


Middle-aged women who have always been 
housewives present a special problem in dif- 
ferentiating between presenile dementia in its 
early stages and involutional depression. ‘The 
psychological aspect of the question largely 
turns on establishing whether or not the pa- 
tient has abnormally deteriorated from her 
original level of ability. With women who 
have followed some occupation outside their 
homes, as with men, original intellectual sta- 
tus can be fairly well established from occu- 
pational level and success, together with spare- 
time interests. It is very difficult, however, to 
estimate the original ability of a woman who 
has been a housewife since her early twenties, 
too busy to have more than a limited range of 
spare-time interests. 

The problem is most acute with those likely 
always to have been below average in ability, 
for example in distinguishing demented dull 
women from depressed but undeteriorated 
mental defectives. Psychometric tests might be 
expected to give a decisive answer here. Re- 
cently, however, much doubt has been ex- 
pressed that the ratios calculated from stand- 
ard tests are trustworthy in what they say 
about original ability and deterioration in in- 
nately dull or defective persons, cf. Rabin and 
Guertin [7]. The following inquiry was pur- 
sued to ascertain what tests best distinguish, 
among middle-aged housewives, demented per- 
sons who were originally dull in ability from 
undeteriorated mental defectives and unde- 
mented dull persons. 


Patients Investigated: Establishment of 
Original Ability 
The inquiry was applied to female in-pa- 
1The authors’ thanks are due to Dr. R. K. Freu- 


denberg, Physician Superintendent, Netherne Hospi- 
tal, for facilities given to carry out this research. 
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tients at Netherne Hospital between the ages 
of 40 and 60 who had been occupied exclu- 
sively as housewives from early maturity up 
to their admission. ‘Iwo classes were contrast- 
ed: those in whom a neurological basis for de- 
mentia had definitely been established either 
of cerebral atrophy or of general paresis of the 
insane (GPI) and those who had been un- 
equivocally diagnosed as involutional depres- 
sives. The latter, according to reasonable pre- 
sumption, were intellectually undeteriorated 
apart from the effects of normal aging. 

For subdividing these persons according to 
original ability it would have been an advan- 
tage to have at hand [Q’s obtained in child- 
hood and adolescence. In England such infor- 
mation is rarely or never available. It was 
necessary to collect evidence retrospectively 
about the abilities of the patients in early life. 

The fullest possible data were gathered by 
the Research Social Worker about each patient 
in the following regards: 

(a) the intellectual stimulation and oppor- 
tunities offered by her home environment in 
childhood and up to marriage, and her re- 
ponse to these; 

(6) her general status during her elemen- 
tary school career, in the opinion of her school- 
teachers, and her particular status in individ- 
ual subjects; 

(c) her status in postelementary educa- 
tion, evening classes, etc., if any; 

(d) her career at work up to her marriage, 
in the opinion of employers (in most cases this 
was domestic work) ; 

(e) her management of home and chil- 
dren in the earlier years of her marriage, cook- 
ing, cleaning, shopping, management of house- 
keeping money, etc. ; 

(f) her spare-time activities throughout 
her life, sewing, reading, tastes in radio or 
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cinema programs, etc. 


The evidence was obtained from a minimum 
of three independent informants in each case 
from among relatives, neighbors, employers, 
and former schoolteachers. Usually there were 
at least five informants, including three who 
were not relatives. School reports were ob- 
tained where available. 

The material thus collected was submitted 
to two psychologists and two schoolteachers 
for independent rating as to original general 
intelligence on a scale: Mentally Defective, 
Dull, Average, Above Average. The patients 
were classified according to the mean of the 
four ratings. 

This procedure was continued until there 
were 8 patients in the following classes; 

(a) demented, formerly dull (7 cerebral 
atrophy, 1 GPI); 

(4) dull but undeteriorated (depressed) ; 

(c) defective but undeteriorated (de- 
pressed ). 


It would have been highly desirable to col- 
lect a much more substantial number of cases 
but the time and expense involved in such 
inquiries precluded this. For the same reason 
it was impossible to secure an adequate num- 
ber of demented average and demented de- 
fective cases. 


Testing of Patients 


Each of the 24 patients was systematically 
questioned concerning early abilities, achieve- 
ments, and interests. This self-evaluation, ow- 
ing to the excessive modesty of some and the 
complacency of others, showed little relation- 
ship to the independently obtained evidence. 

The following tests were applied to each 
patient: 


(a) the complete Terman-Merrill, Form 
L [11]; 
(6) the complete Wechsler-Bellevue, 


Form I [12]; 

(c) items 8, 10, 12, 13, 16, 17, 22, and 23 
from the Babcock Scale (1930) [2]; 

(d) the Shipley-Hartford Vocabulary and 
Abstraction tests [10]; 

(e) Raven’s Progressive Matrices (1938) 
and Mill Hill Vocabulary [8, 9]; 
(f) the Alexander Battery (Block De- 
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Passalong, Cube Construction) [1]; 
(¢) the Weigl-Goldstein-Scheerer Color 
Form Sorting Test [6]; 
(h) the Goldstein 
[5]; 
(i) the Bender visual motor Gestalt Test 


[3]. 


Where means could be calculated in a test 


sign, 


Tachistoscopic Test 


or subtest, the groups were compared by Stu- 
dent’s ¢ ratio. Otherwise they were compared 
by the exact binomial-product method [4]. 
The results may be summarized as follows. 

The dull demented were statistically indis- 
tinguishable from the dull undeteriorated but 
distinguishable at the 5% level from the men- 
tally defective in: all vocabulary tests (Ter- 
man-Merrill, Wechsler-Bellevue, Shipley- 
Hartford, Mill Hill); digits backwards and 
rds in all batteries ; Terman-Merrill Re- 
sponse to Pictures I and II, Rhymes, Minkus 
Completion, Induction, Problems of Fact, Ab- 
stract Words, Arithmetical Reasoning, Prov- 
erbs I and II, Enclosed Box Problem and Sen- 
tence Building; Wechsler Information, Com- 
prehension and Similarities; Babcock Imme- 
diate Reproduction (16), Sentences (22) and 
Paired Associates (17); Goldstein Familiar 
Words (I) and Numbers (V). 

The dull demented were distinguishable at 
the 5% level from the dull undeteriorated but 
indistinguishable from the mentally defective 
in: Terman-Merrill Memory for Stories, 
Memory for Designs and Word Naming; 
Wechsler Picture Arrangement, Object As- 
sembly, Block Design and Digit Symbol; Pro- 
gressive Matrices; Alexander Block Design; 
and Bender Gestalt. 

The dull demented were intermediate be- 
tween the dull undeteriorated and the defec- 
tive and distinguishable from both at the 5% 
level in: Terman-Merrill Picture Absurdities, 
Verbal Absurdities, Similarities and Memory 
for Sentences (Total); and Goldstein Point 
Figures (VII) and Meaningful Objects 
(VIII). In the other tests and subtests there 
were either no statistically significant differ- 
ences or there were differences between the 
undeteriorated dull and the defective only. 


torwa 


Discussion 


These results are in accord with commonly 


Dementia Versus Mental Defect 315 


held opinion about dementia and original abil- 
ity. They suggest that with dull or defective 
middle-aged housewives, as with others, origi- 
nal ability is best reflected in tests involving 
operations on verbal material and dementia is 
most clearly shown in operations on visuo- 
spatial material. They also suggest that with 
dull or defective persons evidence from mem- 
ory tests is equivocal in regard to deteriora- 
tion. The Arithmetic results indicate a possi- 
bility that mastery of the Four Rules might 
differentiate the dull demented from the men- 
tally defective. 

It is disappointing that no performance or 
practical tests seem to reflect former ability. 
Perhaps information tests could be developed 
for middle-aged housewives which would 
throw light on their knowledge of running a 
house, cooking, sewing, etc. These might point 
to their former practical abilities just as in- 
formation tests about a trade point to the effi- 
ciency of the craftsman. 


Summary 


1. A statement is made of the problem of 
distinguishing dull demented from mentally 
defective persons among middle-aged house- 
wives. 

2. <A group of dull demented patients was 
contrasted in a number of standard psycho- 
metric tests with groups of dull undeteriorated 
and defective undeteriorated patients, all be- 
ing former housewives. Original ability was 
established by careful biographical inquiry and 
the independent ratings of several judges. 

3. It was found that the original ability 
of the dull demented housewives was best re- 
flected in tests involving verbal material, their 


deterioration in tests involving visuo-spatial 
material. 

4. It is suggested that information tests 
might be developed to throw light on the form- 
er practical abilities of such patients. 


Received February 8, 1952. 
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Books 


Bond, Douglas D. The lowe and fear of flying. New 
York: International Universities Press, 1952. Pp. 
190. $3.25. 


A former Eighth Air Force psychiatrist, Bond 
does a lively job in communicating his understand- 
ing of the psychology of combat flying. The love 
of flying for its own sake, an almost universal 
feeling among aviation personnel, has rarely been 
given as thoughtful a psychological analysis. In his 
interpretation of combat fear in flyers, Bond empha- 
sizes the impact of situational dangers and mini- 
mizes the importance of neurotic predispositions. 
The weeding out of the “predisposed,” he shows, 
would have eliminated about as many successful 
flyers as unsuccessful ones. The neuroses of flight, 
mainly phobic in form, arose chiefly from situation- 
al conflicts involving the symbolic meaning of flight 
to the flyer, or from aggressive attitudes toward 
friends who were killed or wounded. The book has 
implications for military selection and for the treat- 
ment of psychological casualties —L. F. S. 


Curran, Charles A. Counseling in Catholic life and 
education. New York: Macmillan, 1952. Pp. 
xxvi + 462. $4.50. 


Father Curran’s title expresses well the contents 
of this volume. He fuses the methods and. findings 
of Rogerian nondirective counseling with Thomas 
Aquinas’ philosophical framework. The concept of 
prudence is indicated as the core of self-organiza- 
tion. The major portions of the bojk deal with 
“the processes of personal integration through 
counseling,” “the skill of the counselor,” (relation- 
ship, counseling dynamics and phases) and “the 
approach to counseling” (counseling atmosphere, ex- 
pression of counseling needs, counseling and in- 
formation, and group counseling). There is no at- 
tempt to integrate the entire literature of the field 
into the presentation, but contemporary writings are 
somewhat widely sampled. There are numerous il- 
lustrative excerpts from counseling dialogues. Pro- 





Note: The reviews were prepared by the Editor 
and the Associate Editors, who may be identified by 
their initials. 


gressive members of the Catholic Apostolate will 
welcome this book.—F. McK. 


Kaplan, Louis, & Baron, Denis. Mental hygiene and 
life. New York: Harper, 1952. Pp. xiv + 422. 


$3.50. 


This book aims to meet the demands of the in- 
creasing popular courses in mental hygiene and 
personal adjustment. These courses can serve real 
needs of the college student. He often earnestly 
seeks understanding of his anxieties and self-defeat- 
ing traits. Some of his courses, he hopes, will help 
him do this. Kaplan and Baron state their plan as 
a simplification of the fundamental principles with- 
out specific emphasis on application. Chapters deal 
with mental deterioration, the mental-hygiene move- 
ment, personality, as well as the concepts usually 
presented on dynamic psychology, needs, frustration, 
conflict, and adjustive mechanisms. Texts in this 
field attack a major and difficult problem. It is to 
discover the kind of written presentation that can 
best assist individuals in their personal and social 
adjustment. It appears at present this consists of 
aiding one to discover his inner trends at the rate 
and under the conditions which he can best integrate 
them effectively for self-control. It should be be- 
yond the presentation of concepts and principles of 
adjustment often included in existing introductory 
courses. In the reviewer’s opinion this book ap- 
proaches its goal. However, it could have met the 
aim better had it contained more case material, had 
pointed to more relevant and important daily ap- 
plication of the principles given, and had been 
structured more toward individualized patterns sug- 
gested in the literature of group therapy. —F. McK. 


National Manpower Council. Student deferment 
and national manpower policy. New York: Co- 
lumbia Univer. Press, 1952, Pp. x + 102. $2.00. 


A thoughtful statement of policy on deferment, 
backed by a digest of relevant studies prepared by 
the Council’s research staff. The report suggests, 
but does not discuss in detail, an issue of wide in- 
terest to clinical psychologists. With a limited man- 
power pool, who shall be deferred: students of ex- 
ceptional ability and positive social potential, or the 
psychologically handicapped whose induction might 
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New Books and Tests 


lead to social risks? The problem of “screening” in- 
volves a philosophy as well as a technique.—L. F. S. 


Shneidman, Edwin S., Joel, Walther, & Little, 
Kenneth B. Thematic test analysis. New York: 
Grune & Stratton, 1951. Pp. xi + 320. $8.75. 


Research on the single case is often discussed but 
rarely achisved. In this remarkable volume, fifteen 
well-qualified contributors give their blind inter- 
pretations of the TAT and MAPS protocols of 
“John Doe,” an examinee identified to them only by 
sex, age, and marital status. Appended are his 
Rorschach, VW/echsler, MMPI, Draw-a-Person, and 
Bender-Gestalt performances, each interpreted by 
one worker, and the clinical history and psycho- 
therapeutic notes from his contacts with a VA hos- 
pital and clinic. The first impression of a psycho- 
logical reader is of the richly revealing nature of 
the thematic materials, sensed in the raw protocols 
and amplified in the interpretations. The varied 
qualities of the analyses used by the contributors, 
which range widely from quantitative to intuitive, 
also stand out; on the whole, the intuitive ap- 
proaches seem to come out best. As a textbook in 
thematic analysis, the volume can offer much to stu- 
dents. Yet the enthusiasm for the study must be 
tempered by some cautious reservations. In spite of 
the authors’ circumspect intentions to the contrary, 
the presentation tends to seduce the reader to a 
greater faith in thematic tests than the coldly con- 
sidered facts should permit. “John Doe” is a com- 
plex character, blending anxiety, obsessive-compul- 
sive, depressed, homosexual, schizoid, paranoid, and 
not a few other features. The hospital and clinic 
could not really decide whether he was psycho- 
neurotic or schizophrenic; neither could the the- 
matic test analysts. Such a case provided a maxi- 
mum of clinical “richness” and a minimum of 
critically regarded certainty. A full evaluation of 
a clinical technique still awaits the evolution of re- 
search methods that will bridge the present dilem- 
ma between scope and precision.—L. F. S. 


Victor, Frank. Handwriting: A personality pro- 
jection. Springfield, Ill.: Charles C Thomas, 
1952. Pp. xii + 149. $3.75. 


Victor’s theory of graphology has a number of 
features that seem to commend it to psychologists: 
handwriting is regarded dynamically as expressive 
movement; the major concepts of release, tension, 
and energy are harmonious with all important 
systems of psychological theory. Other aspects are 
not so attractive: for example, the symbolic val- 
ues of above (high, ideal) and below (low, im- 
perfect) are applied arbitrarily to movements above 
and below the base line of handwriting. The 
validity of the system is supported only by argu- 
ments, cases, and appeals to “the collective experi- 
ence of three generations of graphologists.” A con- 
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cluding note specifically rejects the experimental 
validation of graphology as “futile,” 
critical reader with the impression that, as a veri- 
fiable and dependable technique, graphology is 
futile, too.—L. F. S. 


leaving the 


Wolff, Werner. The dream — mirror of conscience. 
New York: Grune & Stratton, 1952. Pp. vii 
348. $8.50. 


An interesting book about dreaming which sum- 
marizes historical concepts of the dream, reviews 
modern approaches, gives numerous cases, and offers 
a contribution to theory. A dream, according to 
Wolff, is a product of inclusion rather than ex- 
clusion, which merges the dreamer’s thought, prob- 
lem, and self. The proper interpretation of a dream 
is therefore its synthesis, not its “analysis.”—L. F. S. 


Zerfoss, Karl P. 
New York: 
639. $6.00. 


(Ed.) Readings in counseling. 


Association Press, 1952. Pp. viii 


Several hundred readings in counseling, selected 
from books, periodicals, and other less available 
sources, have been compiled in one convenient vol- 
ume. The editor has considered the needs of the 
general worker rather than those of the specialist, 
and has placed emphasis on readings that present 
the normal problems of mentally healthy individu- 
als. There is an excellent bibliography and a con- 
tent index that should prove especially helpful in 
making the material readily available to the user. 
The book should prove valuable to those whose re- 
sponsibility is to guide young adults and adolescents. 


—B. M. L. 


Tests 
Arthur, Grace. The Arthur Adaptation of the 
Leiter International Scale. Manual, pp. viii + 


73 ($3.00) Washington: Psychological 
Center Press, 1952. 


Service 


Although this manual contains little that is new, 
it brings together in useful form the material need- 
ed for the administration and scoring of the Arthur 
Adaptation. The Introduction consists of a reprint 
of Arthur’s article describing the Scale (J. clin. Psy- 
chol., 1949, 4, 345-349). After some general in 
structions that are fairly obvious to anyone familiar 
with psychometrics, the remainder of the book con- 
sists of the instructions for the adapted parts of the 
test, taken from Leiter’s 1940 Manual. In spite of 
twelve years’ use, the status of the Leiter test re- 
mains uncertain. Arthur’s contribution provides 
nothing more than the assurance that it was re- 
standardized on 289 “middle-class” children, about 
55 at each age from 3 to 7. Data on reliability and 
validity are still lacking. Arthur’s table which 
shows the median Heinis Personal Constant for 48 
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children remaining substantially constant, in a com- 
parison of the Arthur-Leiter with the Arthur Form 
I and Stanford-Binet three years later, certainly 
does not show that the test gives “a dependable 
basis for prediction,” as she states in her article. 
It may be hoped that other research studies will 
yield more objective information on this apparently 
promising technique.—L. F. S. 


Roeber, Edward C., & Prideaux, Gerald G. Voca- 
tional Interest Analyses. Grade 9-Adult. 1 form 
for each of six areas: Personal-social, Natural, 
Mechanical, Business, Arts, and Sciences. Un- 
timed, (35) min. each. Test booklets ($2.00 per 
25) with manual, pp. 12; IBM answer sheets 
(8¢ ea.); specimen set (75¢). Los Angeles: 
California Test Bureau, 1951. 


The appearance of these blanks marks a new low 
point in psychological test publication. They pur- 
port to be an extension of the publisher’s Occupa- 
tional Interest Inventory, which yields scores in six 
broad areas. Each of the Analyses breaks down an 
area into finer subdivisions; the personal-social 
analysis, for example, divides into domestic service, 
personal service, social service, teaching, law en- 
forcement, and health services. Each blank con- 
tains 120 forced two-choice items; 40 choices repre- 
sent each of the six subareas. The advertising fold- 
er proclaims in large type that these blanks are 


“reliable” and “valid,” but a glance in the manual 


tells a different story. Reliability is claimed by 
analogy from another blank; studies of the reli- 
ability of the present form are only reported as 
“under way.” The discussion of validity is a de- 
fense of the a priori selection of items. The manu- 
al hedges by asserting that “there are no absolutely 
objective criteria for establishing the validity of an 
interest inventory.” Perhaps there are no “abso- 
lutely objective” criteria, but there are useful meth- 
ods of validation, as well attested by the work of 
Strong and Kuder. The final indignity is that the 
manual gives no norms. The advisability of pro- 
viding norms “is being further investigated.” Mean- 
while, interpretations are made in terms of the rank 
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order of raw scores, without even any information 
on the relative drawing power of the items in the 
areas. The publication of such an _ instrument 
would be justified only if it were clearly marked as 
an experimental form for restricted use during a 
process of standardization. To offer these blanks 
for sale as finished products, with seductive adver- 
tising likely to mislead unsophisticated purchasers, 
is a serious threat to the integrity of professional 
psychology.—L. F. S. 


Books Received 


Anderson, Irving H., & Dearborn, Walter F. The 
psychology of teaching reading. New York: Ron- 
ald Press, 1952. Pp. x + 382. $4.75. 

Bjérsjé, Marta. Om spatial, teknisk och praktisk 
begavning. Goteborg, Sweden: Elanders Bok- 
tryckeri Aktiebolag, 1951. Pp. 270. 15 kr. 


Boisen, Anton T. The exploration of the inner 
world. New York: Harper, 1936, 1952. Pp. xiii 
+ 322. $4.00. 

Brown, Harvey Q. The enchanted castle. Boston: 
Bruce Humphries, 1951. Pp. vi + 1038. $3.75. 


Connelly, Thomas R., & Lang, Paul E. Guidance 
bibliography. Newark, N. J.: Washington Irving 
Publishing Co., 1951. Pp. 25. 50¢. 

Faris, Robert E. L. Social psychology. New York: 
Ronald Press, 1952. Pp. vii + 420. $5.00. 


Goldschmidt, Richard B. Understanding heredity. 
New York: Wiley, 1952. Pp. ix + 228. $3.75. 

Gray, J. Stanley. Psychology in industry. New 
York: McGraw-Hill, 1952. Pp. vii + 401. 
$5.00. 

Rubin, Edgar. Experimenta psychologica. Copen- 
hagen, Denmark: Ejnar Munksgaard, 1949. Pp. 
356. $4.75. 

Schacter, Helen. Understanding ourselves. Bloom- 
ington, Ill.: McKnight & McKnight, 1952. Pp. 
124. 70¢. 

Von Foerster, Heinz. (Ed.) Cybernetics. Trans- 
actions of the Eighth Conference, March 15-16, 
1951. New York: Josiah Macy, Jr. Foundation, 
1952. Pp. xx + 240. $4.00. 
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